1. Ingest
Upload PDF or markdown and enqueue indexing through the API or console.
Architecture
Instead of flattening documents into disconnected chunks, TurboDex builds a navigable tree with section lineage, summaries, and query-ready metadata.
Every document uploaded to TurboDex flows through four deterministic stages.
Upload PDF or markdown and enqueue indexing through the API or console.
TurboDex identifies TOC structure, page boundaries, and hierarchical relationships.
The tree is persisted in YottaDB with node-level metadata and retrieval lineage.
Queries traverse hierarchy-aware nodes for better answer grounding and summaries.
Section lineage is lost at the chunking step. Retrieval can mix semantically similar but structurally unrelated content — a risk factor paragraph alongside a financial projection paragraph looks identical to a vector index.
Retrieval respects document anatomy. The engine knows that "Section 4.2.1 Risk Factors" is a child of "Section 4 Management Analysis" and answers accordingly. No structural drift.
Each node in the TurboDex tree stores structured metadata beyond plain text.
Full path from root to leaf — Root › Chapter › Section › Paragraph — preserved per node.
AI-generated summaries at each structural level, enabling hierarchical answer synthesis.
Physical page ranges, token counts, OCR flags, and document boundary markers per node.
Stored in a high-performance hierarchical key-value store. Sub-millisecond node lookup without re-embedding.
Every node is indexed and traversable by title, depth, parent, child, and semantic distance.
Automatic OCR detection and fallback for image-only pages. OCR page count tracked per document.
Start in SaaS for immediate access, or model private-cloud economics if compliance and data control are priorities.