Friday, April 1, 2011

Announcing the NuMap storage for MongoDB

NuMap is an alternative storage engine for use with MongoDB 1.9+. NuMap offers dramatically increased performance for high load applications.

Enabling

To enable the NuMap storage engine, us the --numap command line option.

Description

Earlier versions of MongoDB used a mmap storage engine. In this model database files were mapped into memory. When a transaction modifies an object in the database, the memory page is modified directly. The system periodically flushes dirty pages to disk.

This has the side effect of generating lots of random IO. Over any given time period, transactions may modify random objects spread around the database file. When these changes are flushed to disk, the corresponding pages may map to essentially random locations on disk.

A common alternative is to use append-only files. In this model, modifications to the objects are always added to the end of the db's data structures. When changes are flushed to disk, we only ever append to the back of files. This leads to higher throughput since the disk spends less time seeking around disk.

However, append only data files lead to inefficient space utilization. Deleting an object does not free any space on disk and modifications require replication of the object multiple times within the file. With append only files, a process must periodically process the data files and clean out unused data. This is time consuming and disk IO intensive.

The NuMap storage engine leverages a new approach of write only memory to provide efficient database storage. NuMap works by automatically clearing writes as soon as they are written, eliminating the need for any disk access at all. This write-once, read-never data structure provides incredibly high read/write performance, durability, compression and replication benefits.

Performance

Insert, Update, Delete and Read operations on the NuMap storage engine are constant time O(1). Inserts are slightly slower than other operations, as the storage engine must clear registers and memory pages before committing the transaction. Reads are extremely efficient as multiple connections can share the same response and all responses are constant size.

Here's a simple benchmark comparing the performance of performing 10,000 writes on the mmap vs. numap storage engines

numap:~ root$ time ./insert_10000_mmap.sh
connecting to: test

real 0m0.984s
user 0m0.876s
sys 0m0.100s

numap:~ root$ time ./insert_10000_numap.sh

real 0m0.545s
user 0m0.418s
sys 0m0.127s

As you can see, the numap storage engine is nearly 2x as fast as mmap for inserts.

Durability

Writes are guaranteed to be cleared as soon as they return. Clients will read the cleared result immediately, even after machine failure. In fact, a client can read cleared writes even during machine failures.

Compression

Since writes to write only memory are cleared immediately, the storage engine can store an arbitrary number of writes in constant space. This means that compression actually becomes more effective as more data is written.

Replication

Since all reads return only cleared writes, writes are immediately consistent on all replicas. Even disconnected and failed nodes will return consistent reads.