My BlogDecember 25, 2025

That Time I Tried to Make my own Database From Scratch

Suvan Gowri Shanker

Why I Wanted to Build My Own Database

It all started when I was trying to build collaborative, local-first applications. I wanted my apps to respond instantly, work offline, and resolve conflicts automatically when devices reconnected. The core problem was building a system that supports optimistic local writes — no locks, no central leader — while guaranteeing deterministic convergence of replicas under partitions, message loss, and device churn. This property is formally known as Strong Eventual Consistency (SEC).

The CRDT Promise

Conflict-free Replicated Data Types (CRDTs) provide the mathematical foundation for this convergence, but taking theory into production exposes gaps: transport assumptions, unbounded metadata growth, and reactivity issues.

The Open Problems I Faced

Beyond the basic question of "does it converge?", I realized a practical CRDT database had to handle real-world challenges:

Exploring the Solutions

I spent a lot of time reading papers and evaluating different CRDT approaches. Here's a quick taxonomy of what I explored:

Approach	Core Idea	Trade-off
CvRDTs (state-based)	Join-semilattice merge	Full-state shipping is expensive
CmRDTs (operation-based)	Commutative concurrent operations	Requires stronger dissemination guarantees
δ-CRDTs (delta-state)	Delta-mutators return minimal state	Bandwidth-efficient, preserves causal consistency
Merkle-CRDTs	Merkle-DAG clocks	Decoupled from membership size, requires DAG management

How I Architected Carnelia

After evaluating the pros and cons, I designed the Merkle-Delta CRDT Store (MDCS). I structured it into four tightly integrated layers:

1. δ-CRDT Core

Efficient incremental dissemination using a document-oriented composition of maps, sets, and registers.

2. Merkle-Clock Sync

Open membership and recovery via content-addressed DAG synchronization.

3. Stability-Guided Compaction

Bounded metadata growth through principled pruning when intervals are acknowledged.

4. Reactivity Guardrails

Exposing buffered operations to avoid waiting on unrelated updates.

The Two-Layer Sync Model

One of the most interesting parts of the build was the synchronization architecture. It operates purely on two complementary layers:

Layer A (Delta-interval anti-entropy): Uses delta-state anti-entropy with acknowledgements for causal merging and garbage collection.
Layer B (Merkle-Clock summaries): Uses Merkle-Clock roots as compact frontier identifiers for discovery and gap repair.

Key Design Decision

The Merkle-DAG handles discovery and gap repair, while delta-interval acknowledgements provide causal merging discipline and garbage collection eligibility. This hybrid approach gave me the best of both worlds.

Looking Back: Merits and Trade-offs

Building Carnelia was a massive learning experience. On the plus side, my approach avoids strict dissemination assumptions. Deltas act as idempotent joins, handling duplication gracefully. The open membership model works well for dynamic sets, and the metadata compaction is principled. However, the trade-offs are real. Multiple layers mean careful integration and deep testing. The Merkle history can grow without snapshots, and building explicit structures over a primarily non-serializable base demands careful invariant design. In the end, while it isn't perfect, it's a solid foundation for the resilient, local-first apps I set out to build!

Share this post: