Store vectors, full-text, and structured data in one database. Retrieve with hybrid search and fusion re-ranking. Serve fresh, multi-modal context to LLMs and agents in milliseconds.
CDC, Kafka streaming, and HTTP ingest at 10M rows/s. New data searchable in ~1 second. No more stale embeddings causing confident wrong answers.
Hybrid Search fuses vector similarity, BM25 keyword matching, and SQL filters with Reciprocal Rank Fusion. 94% relevance vs 58% with pure vector search.
Vectors, full-text, structured tables, and semi-structured JSON in one database. Replace Pinecone + Elasticsearch + PostgreSQL with one query path.
Policy docs, support tickets, product specs — chunked, embedded, and searchable via hybrid search. The LLM retrieves current context, not a 6-month-old PDF snapshot.
AISpeech manages 10B+ samples across 500TB with millisecond version switching, 80% storage reduction, and full data lineage for model reproducibility.
Horizon Robotics queries nearly 1 trillion records across four search modes — text, vector embeddings, bitmap labels, and JSON metadata — on a single engine.
Agents query VeloDB directly through the MCP Server protocol. Hybrid search returns structured + semantic context in sub-100ms for agent reasoning loops.
Raw data enters through CDC, streaming, or HTTP push. An embedding pipeline like CocoIndex transforms, chunks, and embeds your documents with incremental processing -- only changed documents are reprocessed. VeloDB stores everything in one engine and serves AI applications through hybrid search via MCP Server, MySQL protocol, or REST API.
VeloDB doesn't run three searches in parallel and hope for the best. It progressively narrows the candidate set at each stage -- structured filtering first, keyword matching second, vector search third. Expensive operations run on a tiny, pre-filtered subset.
B-tree indexes, zone maps, and partition pruning apply hard constraints -- location, date, category, access permissions. The candidate set drops from millions to thousands before any search operation begins.
Inverted indexes with global BM25 scoring find exact keyword matches. Unlike per-segment scoring (which caused ranking instability at ByteDance), VeloDB calculates statistics across the entire table. Supports English, Chinese, and multilingual tokenizers.
HNSW indexes run semantic search on the pre-filtered subset. IVPQ compresses 768-dim vectors from 3KB to 8 bytes (384x ratio). Reciprocal Rank Fusion combines rankings from all three stages: documents that score well across multiple signals rank highest.
ByteDance needed to search 1 billion+ vectors for talent matching. Pure vector search delivered only 58% relevance -- a recruiter searching for Python developers in San Francisco got candidates from Seattle. Rankings shuffled every time database segments merged because BM25 was calculated per-segment, not globally.
Apache Doris 4.0 with progressive filtering changed everything: structured constraints first (50ms), keyword matching second (200ms), vector search third (100ms), fusion ranking last (50ms). IVPQ compression reduced 768-dimension vectors from 3,072 bytes to 8 bytes -- a 384x compression ratio. Relevance jumped from 58% to 94%. Latency dropped from 2.8 seconds to 400 milliseconds. Memory shrank from 10TB across 20-30 servers to 500GB on one server.
Conversational AI company managing 10 billion+ multimodal training samples across 500TB. Before: training data scattered across different storage systems, maintained manually by different teams. Conflicting data versions undermined model consistency. Algorithm engineers wasted time searching for and re-annotating data. After building on Apache Doris: columnar storage compressed annotation data 80%. Version-based partitioning enables millisecond dataset switching -- active versions on SSD, history auto-migrates to HDD. Point query QPS hit 30,000 with row-store optimization (CPU: 80% --> 10%). Now planning upgrade to Doris 4.0 for vector search to fully replace Elasticsearch.
Autonomous driving company processing petabytes daily. Replaced three separate systems -- Hive/Iceberg for analytics, Zilliz for vectors, Elasticsearch for search -- with one Doris engine. Four search modes unified: text, vector, label bitmap operations, and JSON metadata. Engineers stopped hopping between systems. Query times dropped from minutes to seconds on approximately 1 trillion records.
Step-by-step guide to preparing your data for VeloDB's hybrid search. Chunking strategies, embedding models, and indexing best practices.
Read the guide -->Create HNSW indexes, run vector similarity queries, configure IVPQ compression, and build hybrid search with RRF.
Read the guide -->14-day free trial for SaaS. 30 days of free compute for BYOC. No credit card required.
Start Free Trial -->A 30-minute architecture review with a VeloDB solutions engineer. Bring your schema, your query patterns, and your freshness requirements. We will show you where a unified knowledge store eliminates complexity.