This is nothing more but an outline for the time being. It’s not anything sophisticated, I just write comments that keep me on track and help me stay focused for when I go to re-architect OSIB. This is pre‑alpha. Don’t expect polish; expect sane defaults. The architecture of OSINTBuddy is currently kinda crappy, I’ve kept it boring for prototyping and exploration purposes but now its time for a slight upgrade…

OSIB (aka OSINTBuddy) requires running potentially untrusted data transforms, preserving provenance, and making relationships explorable. We’ll need isolation, traceability, and a UI that reflects relationships over time.


  1. auditability and time travel capabilities through event sourcing belongs only where where it can provide real value; Cases. e.g. providing temporal capabilities, auditability, and traceability for case events. (cases are aka entities, and also consist of the relationships between those entities; cases can be displayed as a graph or in the future, as a table too). We’re also likely going to ditch Apache Age for PostgreSQL 17

  2. We currently can only run one entity transform at a time. RabbitMQ and Rust workers help out here

  3. We need a strong secure default for a tool that may run code from a mix of first and untrusted third-party plugins. Firecracker microVMs should make easy work of this

  4. to be continued…

Backend services

  • API: keep it thin. Validate input, enqueue jobs, expose read models, publish events over WebSockets. No heroics. HTTP API for everything else.
  • Worker: does the dangerous, long‑running transform/data collection stuff. Reads AMQP, starts a Firecracker microVM per job, collects outputs, persists results and case events, tears down cleanly.
  • Database: Normal PostgreSQL 17 relational tables plus a case‑scoped event log. That’s it.
  • Queue: Separate “ask” from “do” and provide backpressure. Useful, not optional.

Event Sourcing

Event sourcing should be applied narrowly: cases emit an append-only stream of domain events, persisted in PostgreSQL.

  • Scope: Only case entities and relationships. Avoid system-wide streaming dogma.
  • Value: Temporal queries (“state at T”), auditability (“who changed what”), and traceability across transforms and user actions.

Links:

Frontend client

The TypeScript/Preact client is currently a graph-based UI for visualizing relationships: entities, attributes, links, and the transforms that stitch them together.

Ideas for improvements:

  • Diff-views for entity merges and deduplication.
  • Pair visualizations with an event timeline that reflects the case’s underlying stream over time/for specific time periods.

E2EE?

What should the security objectives be for OSINTBuddy? I’m not sure, I suppose we should do some threat modelling…

Threat Modelling

to be continued…

OSIB Users

to be continued…

OSIB Instances

  • We need to be able to run multiple untrusted plugins concurrently.
    • Risks: plugin escapes, resource exhaustion, image tampering, etc.
    • Mitigations: KVM-based isolation, resource quotas/timeouts, read-only base images + signatures, minimal privileges outside the worker.
  • I’m most likely going to use Firecracker for this. Firecracker is for creating and managing secure, multi-tenant container/function based services. Think secure and fast microVMs. A bonus of this is I could potentially provide a REPL for in-browser plugin (read entity) development. Imagine a fully-fledged bash shell with neovim for development, well, unlikely I’ll do that, but I can dream. I found this rust library that was created around 5 months ago by a Junior student at Peking University. I’m thinking screw it, no issues, 8 stars (now 9 from me), it’s good enough for this project for now.

Existing plugin system issues

Some thoughts on how to improve the plugin system…

  • We can currently only return entities directly attached to the source entity transformed from, in the instance we want to return a subgraph of relationships and entities we are unable to do so
  • There’s no plugin settings/config system yet that can be loaded from the users settings
  • We aren’t able to store metadata/general information on a plugin for the market UI (e.g. a README for a set of plugins and or a PLUGIN_FILENAME.md for info on a specific plugin)
  • Can’t include external files with plugins, e.g. in the CSE Search plugin we depend on a Python request to a gist of JSON data, this is annoying and instead we should be able to store this JSON data in a plugins repo
  • Can’t include external dependencies with a plugin, since we’re going to be running in a Firecracker VM we should be able to install any new requirements.txt dependencies or package.json (node deps) into the Firecracker environment
  • Plugins are Python based, since we’re in a Firecracker VM we should perhaps load it up with some default scripts (e.g. node, nmap) and support multiple languages. For this we need to write docs on the expected structure of JSON returned from plugins
  • Add a ‘hidden’ element type to plugins to store additional properties, all the visible properties of a plugin are rendered to the UI (including Empty which is used for element positioning), a hidden property would only show up when the user clicks a selected edge or selected entity’s button for showing all properties in a panel
  • Plugins should be able to be loaded from a list of strings (Python strings for plugins, JSON for resources, .md for metadata, basically we need to note the filetypes loaded from a plugin repository) instead of only from filepaths which is what is currently used in development mode
  • In dev mode any plugins edited on the UI should automatically write any changes to the filesystem
  • In development mode the filepath to all plugin resources should be a setting on the settings page
  • See if we can add types for when we get a plugin, Registry.get_plugin('type the possible plugin names/strings here?'), not sure how to do this but it would make development a lot nicer
  • Add support for getting plugins by both the snake_case_format and the actual name e.g. Registry.get_plugin('Google Search')
  • If a plugin has a syntax error the entire plugin system fails in a spectacular fashion, we should instead notify the user of the issue and correctly load the rest of the working plugins
  • Finish implementing support for generators so we can provide progress updates on a plugin transform. e.g.if we are searching 20 pages of google and storing those results we should be able to send back the results as we pull them (on the ui.. “10 Google result entities returned! Loading more…“)

to be continued…

Footnotes

  1. https://users.rust-lang.org/t/build-an-actix-web-endpoint-to-see-live-changes-of-my-redis-cache/109333

  2. https://designpatternsmastery.com/1/13/3/1/

  3. https://designpatternsmastery.com/1/10/4/

  4. https://designpatternsmastery.com/1/14/1/1/#concurrency-patterns

  5. https://stackoverflow.com/questions/42471870/publish-subscribe-vs-producer-consumer

  6. https://github.com/dezashibi-c/a-prod_cons_vs_pub_sub_in_c

  7. https://dev.to/aaravjoshi/6-essential-websocket-patterns-for-real-time-applications-39gf

  8. https://docs.rs/actix/latest/actix/trait.AsyncContext.html

  9. https://meta.discourse.org/t/introduction-to-discourse-development/349939

  10. https://users.rust-lang.org/t/async-queue-with-concurrent-batch/75138

  11. https://peerdh.com/blogs/programming-insights/implementing-a-rust-based-sandbox-for-isolated-command-execution

  12. https://codezup.com/building-high-performance-python-extensions-with-rust-guide/

  13. https://nullderef.com/blog/plugin-tech/

  14. https://medium.com/devsphere/implementing-rust-based-plugins-for-existing-software-ecosystems-264054028c59

  15. https://github.com/wax911/plugin-architecture

  16. https://github.com/topics/plugin-system?l=rust

  17. https://www.reddit.com/r/rust/comments/6v29z0/plugin_system_with_api/

  18. https://mathieularose.com/plugin-architecture-in-python

  19. https://pyquesthub.com/creating-a-dynamic-plugin-system-in-python

  20. https://colliery.io/blog/rust-python-pattern/

  21. https://medium.com/@kudryavtsev_ia/how-i-design-and-develop-real-world-python-extensions-in-rust-2abfe2377182

  22. https://reorchestrate.com/posts/plugins-for-rust/

  23. https://medium.com/rustaceans/a-plugin-system-for-rust-but-not-only-using-webassembly-26bb3d327c10

  24. https://blog.herecura.eu/blog/2020-05-21-toying-around-with-firecracker/

  25. https://lib.rs/crates/firecracker-rs-sdk

  26. https://microservices.io/patterns/data/transactional-outbox.html

  27. https://www.bytefish.de/blog/outbox_events_postgres_dotnet.html

  28. https://microservices.io/patterns/data/event-sourcing.html

  29. https://martinfowler.com/eaaDev/EventSourcing.html

  30. https://event-driven.io/en/the_end_is_near_for_crud_data/

  31. https://kylecordes.com/2014/task-based-user-interfaces

  32. https://en.wikipedia.org/wiki/Bitemporal_modeling

  33. https://zfnd.org/so-you-want-to-build-an-end-to-end-encrypted-web-app/

  34. https://thomasbandt.com/browser-based-end-to-end-encryption-overview

  35. https://threatmodel.co/blog/threat-modeling-with-attack-trees

  36. https://en.wikipedia.org/wiki/STRIDE_model

  37. https://threat-modeling.com/pasta-threat-modeling/

  38. https://en.wikipedia.org/wiki/Fold_(higher-order_function)

  39. https://web.archive.org/web/20230402144220/https://docs.gigaspaces.com/sbp/master-worker-pattern.html

  40. https://medium.com/@_JeffPoole/thoughts-on-push-vs-pull-architectures-666f1eab20c2

  41. https://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol