Technical whitepaper · v0.9.4

The VeltrixDB whitepaper.

42 pages of how the storage engine actually works — block-by-block diagrams, the durability proof, the sharding model, and the failure modes we test in chaos engineering. Written for the engineer who'll be on call for this in production.

Pages · 42 Author · A. Gupta Updated · May 2026 License · Closed source · reproducible benchmarks
VeltrixDB · technical reference
v0.9.4-beta · build cb44a1f
42 pages · classified shareable
PDF · 4.8 MB

The extreme point-lookup engine — architecture & durability.

A first-principles walk through the storage layer, the I/O model, the sharding scheme, and the proofs behind a 99.999% durability SLA on commodity NVMe.

42
pages
18
diagrams
9
chapters
Why we wrote it

Closed source doesn't mean opaque.

We're closed source — for reasons that are honest and uninteresting. But we shipped a 42-page architectural document so that no CTO ever has to take "trust us" for an answer. If your security-review team wants the engineer who wrote a section on a call, we'll send them. This is the document that makes that conversation possible.

01 Storage engine

Key–value separation, end to end.

The full case for WiscKey-style separation — index layout, VLog segment format, pointer encoding, and the production-hardening changes we made over the original paper.

Chapter 1–214 pages
02 Kernel I/O

io_uring SQPOLL on the hot path.

How we structure the submission and completion queues per shard, why we use O_DIRECT, and the benchmarks comparing io_uring vs pread at our P99 targets.

Chapter 36 pages
03 Durability

The fsync chain and the 99.999% SLA proof.

Group-commit WAL, replication invariants, the actual probability calculus behind the published durability number, and the conditions under which it does not hold.

Chapter 45 pages
04 Sharding & placement

1024 shards · consistent hashing · zone-aware.

Why 1024, how we rebalance without read interruption, the operator's reshard procedure, and the cap on individual shard ownership during partition events.

Chapter 56 pages
05 Cache (LIRS)

Why LRU lies — and what we replaced it with.

The LIRS algorithm in detail, the priority-2 eviction-resistance tweak for small hot keys, and the measured hit-rate uplift on Zipfian workloads vs LRU.

Chapter 64 pages
06 Failure modes

What we test in chaos engineering.

The full failure-injection matrix — single-node loss, full-zone outage, NVMe bit-flip, kernel hang, network partition. Every test case, the expected behaviour, and the RTO/RPO numbers we hit.

Chapter 7–87 pages
A sneak peek

The kind of detail you'd expect from a paper.

Two pulled-out figures from the document — the write path, and the failure-injection matrix. The full PDF has 18 diagrams of this density.

Figure 03 · Write path

Group-commit WAL with amortized fsync.

The five-stage write pipeline. Application code lands in a per-shard ring buffer; a dedicated commit thread wakes every 50µs (or on batch-full), issues a single fsync, and unparks every blocked writer at once. The 80µs fsync cost is divided across the entire batch — at 2M ops/sec, that's roughly ~0.85µs per write for full durability.

The full chapter covers the lock-free queue design, the futex-park behaviour, the WAL crash-recovery protocol, and a proof that the WAL never loses an acknowledged write under any single-node failure.

FIG 03 · WRITE PATH · POST-ACKP99 0.21ms
01 · 12µs
AppWAL ring
02 · 80µs
WAL fsync·group-commit
03 · 30µs
VLog append·O_DIRECT
04 · 8µs
Index pointer·ART insert
05 · 5µs
Cache warmack
Figure 14 · Chaos matrix

Every failure mode we test, and the SLO it owns.

The chaos-engineering matrix lives in chapter 7. We run each of these injections nightly against a production-equivalent cluster. The chapter documents the exact fault-injection spec, the expected user-observable behaviour, and the recovery time we measured over a 90-day window.

The point of publishing this isn't to say nothing ever fails — it's to say we know what does, and we test for it. When you have an incident, the failure mode you'll hit is almost certainly already in the matrix.

FIG 14 · FAILURE MATRIXRTO < 30s
single node
kill -9RTO 8s
full AZ
network dropRTO 22s
NVMe corrupt
bit-flip injectdetect <1s
kernel hang
CPU stalleject 12s
cross-region
QUIC partitionRPO <30s
Table of contents

All 42 pages, in order.

Every section, every figure number, every page. If a chapter is what you're after, we'll send you the chapter — you don't have to take the whole PDF.

VeltrixDB · Technical Reference · v0.9.4-beta

42 pages · 18 figures · 9 chapters
00Preface & notationAudience, scope, and how to read this paperp. 1
01The case for key-value separationWhy LSM compaction is the silent tail-latency killerp. 4
02Storage engine internalsVLog segment format, ART index, garbage collectionp. 9
03Kernel I/O — io_uring SQPOLL + O_DIRECTSubmission queues, completion polling, syscall accountingp. 16
04Durability & the 99.999% SLA proofGroup-commit WAL, replication invariants, probability calculusp. 22
05Sharding & placement1024-shard consistent hash, zone-awareness, online reshardp. 26
06LIRS cache & small-hot-key resistanceThe eviction algorithm and the priority-2 tuningp. 31
07Chaos engineering & failure modesFault-injection matrix, RTO/RPO numbers, runbooksp. 34
08Security modelmTLS, AES-256, BYOK, key rotation, audit logp. 38
09Appendix — wire protocol & metrics referenceBinary protocol spec and the 50+ Prometheus metricsp. 41
Chapter 7 · sample

Five failure modes we test nightly.

A taste of what's in the chaos chapter — every one of these is exercised against a production-equivalent cluster every night, and the results are published to the same Prometheus we expose on customer clusters.

01

Single-node loss

We kill -9 a healthy node at random under sustained 2M ops/sec load. Expected: reads continue from replica within 8 seconds, no acknowledged writes lost. Measured RTO: 8.4s P99.

02

Full availability-zone outage

iptables-drop all traffic from a zone for 30 minutes. Expected: traffic shifts to remaining zones, write quorum maintained on 2/3 zones. Measured RTO: 22s · zero data loss.

03

NVMe corruption

Inject a deliberate bit-flip into a sealed VLog segment. Expected: checksum trips on next read, value re-fetched from replica, bad segment quarantined. Detection < 1s.

04

Kernel CPU stall

Stall a CPU running the io_uring SQPOLL thread for 30s. Expected: liveness probe trips, the node is auto-ejected from the read pool, traffic re-routes to healthy nodes. Ejection in 12s.

05

Cross-region partition

Cut QUIC replication between two regions for 5 minutes. Expected: both regions continue serving local reads & writes, conflict resolution applies on heal. RPO < 30s, no acknowledged writes lost.

06

Compaction storm

Force every shard into aggressive GC simultaneously while serving peak load. Expected: GC respects its cgroup weight, P99 reads stay under 8ms. Measured P99 during storm: 6.4ms.

Get the full whitepaper in your inbox.

One email gets you the 42-page PDF — including the chapter you skipped. If you want a specific chapter, mention it in the body and we'll send just that.

Closed source · architecture documented in full · we'll walk a security-review team through any component