Skip to main content

Architecture

TurboDex keeps document hierarchy intact so your AI answers with context, not fragments.

Instead of flattening documents into disconnected chunks, TurboDex builds a navigable tree with section lineage, summaries, and query-ready metadata.

The Four-Stage Pipeline

Every document uploaded to TurboDex flows through four deterministic stages.

📥

1. Ingest

Upload PDF or markdown and enqueue indexing through the API or console.

📊

2. Parse

TurboDex identifies TOC structure, page boundaries, and hierarchical relationships.

📌

3. Persist

The tree is persisted in YottaDB with node-level metadata and retrieval lineage.

🔍

4. Retrieve

Queries traverse hierarchy-aware nodes for better answer grounding and summaries.

Traditional RAG Pipeline vs TurboDex Pipeline

Traditional Chunk / Vector Flow

Document Chunking Embeddings
Vector Search Candidate Snippets Answer

Section lineage is lost at the chunking step. Retrieval can mix semantically similar but structurally unrelated content — a risk factor paragraph alongside a financial projection paragraph looks identical to a vector index.

TurboDex Hierarchical Flow

Document Structural Parsing Hierarchy Tree
Node Summaries Context Traversal Answer

Retrieval respects document anatomy. The engine knows that "Section 4.2.1 Risk Factors" is a child of "Section 4 Management Analysis" and answers accordingly. No structural drift.

What the Hierarchy Tree Captures

Each node in the TurboDex tree stores structured metadata beyond plain text.

Section Lineage

Full path from root to leaf — Root › Chapter › Section › Paragraph — preserved per node.

Node Summaries

AI-generated summaries at each structural level, enabling hierarchical answer synthesis.

Page & Token Metadata

Physical page ranges, token counts, OCR flags, and document boundary markers per node.

YottaDB Persistence

Stored in a high-performance hierarchical key-value store. Sub-millisecond node lookup without re-embedding.

Query-Ready Metadata

Every node is indexed and traversable by title, depth, parent, child, and semantic distance.

OCR & Scanned PDF Support

Automatic OCR detection and fallback for image-only pages. OCR page count tracked per document.

Ready to test with your own document set?

Start in SaaS for immediate access, or model private-cloud economics if compliance and data control are priorities.