How Databases Search a Billion Rows
Every database you use is built on B-trees. See how they minimize disk I/O to search billions of rows efficiently.
Your database has a billion rows. You run a query. The answer comes back in milliseconds. How?
Not with binary search. Your data lives on disk, and disk reads come in fixed-size pages — four to sixteen kilobytes each. A spinning disk takes about ten milliseconds per read. Binary search on a billion records means ~30 page reads. On a spinning disk, that's 300 milliseconds for a single query.
The bottleneck isn't comparisons — your CPU handles millions per second. The bottleneck is disk I/O. What if each node could hold not one key, but hundreds?
One Node, One Disk Page
A B-tree node holds many keys in sorted order, with child pointers between them. The critical design choice: each node is sized to fill exactly one disk page.
One disk read loads hundreds of keys into RAM. Binary search within that page is essentially free. Find which range your target falls in, follow the child pointer, read the next page.
| Keys Per Node | 2 Levels | 3 Levels | 4 Levels |
|---|---|---|---|
| 500 | 250,000 | 125 million | 62.5 billion |
A billion rows? Three to four disk reads. That's it. This is why PostgreSQL, MySQL, and SQLite all use B-trees as their default index structure.
Splits: Growing from the Top
Inserting a key is straightforward — slot it into sorted position within a node. Until the node is full.
When a node overflows, it splits. The node breaks into two halves, and the median key gets pushed up to the parent. If the parent is full, it splits too. The cascade continues upward until it finds room — or the root itself splits.
When the root splits, a new root is created above it. The tree grows from the top, not the bottom. This guarantees something remarkable: every leaf is always at the same depth. Not approximately. Not eventually. Mathematically guaranteed on every insert and every delete.
That means the number of disk reads for any lookup is bounded by the height of the tree. Predictable. Small. Consistent.
Why It's Even Faster in Practice
That root node your database keeps? It's cached in RAM. The second level usually is too — they're accessed so frequently they almost always live in the page cache.
So in practice, finding any row among a billion takes two, maybe three actual disk reads. Not thirty. Two.
Most databases actually use a variant called a B+ tree, where all data lives in the leaves and the leaves are linked together. Need a range query? Find the start, then walk sideways through the leaf chain. No jumping back to the root.
| Feature | Binary Search Tree | B-Tree |
|---|---|---|
| Keys per node | 1 | Hundreds |
| Disk reads per lookup | ~30 (billion rows) | 3-4 |
| Balance guarantee | Requires rebalancing | Always perfectly balanced |
| Optimized for | RAM | Disk page I/O |
B-trees aren't just a data structure. They're the bridge between your data and the physical reality of storage hardware. Next time your query returns instantly, you'll know what's working underneath.
Watch the full animated breakdown: B-Trees: How Your Database Finds Data
