Skip to content

Distributed Reasoning Plane

AnamDB is built to scale from a single-node embedded database kernel to a distributed multi-agent reasoning plane.

In distributed setups, the database coordinates neural perception and symbolic reasoning workloads across heterogeneous clusters.


The 5-Stage Symbolic Integration Pipeline

Every query execution is divided into five distinct stages to maximize optimization and concurrency:

  1. Stage 1 — Data Preprocessing: Row columns are read from Lance storage and transposed into vector-symbolic Arrow layouts.
  2. Stage 2 — Neural-Symbolic Embedding: Features are extracted and inference is run on ONNX models with first-order logic constraints.
  3. Stage 3 — Domain Knowledge: Intermediary scores are cross-referenced and augmented using active domain ontologies.
  4. Stage 4 — Logical Reasoning: The logic engine applies active Datalog rules over neural outputs and ontology mappings.
  5. Stage 5 — Symbolic Postprocessing: Final constraints, BCNF compliance, and schema requirements are verified before result compilation.

BCNF Policy Catalog

All rules and constraints are stored in a strict Boyce-Codd Normal Form (BCNF) database catalog.

  • Incremental Replication: Updates, additions, or deprecations of Datalog rules are propagated throughout the cluster via version-stamped changeset deltas.
  • Conflict Resolution: Version-stamps guarantee that all nodes evaluate queries against the exact same logic snapshot, preventing state drift during query execution.

Multi-Agent Task Router

The task router maps execution stages and model operators to the most suitable physical hardware in the cluster:

Multi-Agent Task Router distributing workloads across Edge, Core, and Hybrid compute nodes.

Cluster workloads are distributed based on node capabilities:

Node TypeResourcesIdeal WorkloadDescription
Edge NodeNPU / GPU, 4GB RAMPerception (OCR, Audio)Runs lightweight model inference directly where data is ingested.
Core NodeMulti-core CPU, 64GB+ RAMSymbolic Joins (Datalog)Executes memory-intensive relational aggregates and constraint logic.
Hybrid NodeCPU + CUDA GPUs, 32GB RAMMixed (NLP + Datalog)Performs complex NLP processing followed by logic rules.

Network-Aware Distributed Optimizer

When executing across a network, the multi-objective Pareto optimizer incorporates network overhead (data serialization and transfer times) alongside compute latencies:

Distributed Pareto Frontier showing compute and network latency breakdown for each deployment option.

Progressive Query Rewrite

If a remote edge node becomes congested at runtime or network latency spikes, the coordinator dynamically rewrites the physical query plan. It transparently shifts the perception operator to a base model on a core node to satisfy the query's latency constraint without returning failure.


Global Lineage

Global lineage traces results across physical cluster hops. Developers can inspect the exact pathway a tuple traveled, which models processed it, and how confidence scores fluctuated:

Global Lineage trace showing two hops for txn_001 across edge and core nodes.

Released under the BSL-1.1 License.