I read the latest DynamoDB paper so you don't have to.
Intro
The latest DynamoDB paper was presented at UseNix. When the first DynamoDB paper was published back in 2007 it was ground breaking.
Takeaways
During Prime day 2021 DynamoDB globally received trillions of API requests peaking at 89.2 MM requests per second.
DynamoDB uses a multi-tenant design they store the data from different customers on the same physical machine to drive down costs.
While DynamoDB is a NoSQL database customers can request strong consistency when reading items from a table.
DynamoDB uses a standard partition key and sort key to distribute data across a table.
DynamoDB supports ACID transactions while still maintaining its CAP theorem guarantees.
DynamoDB splits data into a replication group. The replication group uses multi-Paxos for leader election and consensus. Only the current leading replica can serve write and strongly consistent read requests. The replica uses a write-ahead log and notifies its peers in the replica group.
DynamoDB uses admission control to manage throughput. They calculate the throughput for each partition on a storage node, checking that it does not exceed the throughput of the secondary storage.
DynamoDB replicates its write ahead log to S3 to ensure that any data loss of committed data can be recovered by re-reading the log stored on S3.
DynamoDB continuously verifies data at rest through the use of checksums.
DynamoDB supports backup and restores that do not impact performance as they use the write ahead log stored on S3 to do them.
A DynamoDB partition remains available once there are enough replicas to form a write quorum and leader.
DynamoDB can still function while dependencies such as AWS IAM and AWS KMS are not available.
Outro
If you want to learn more about the internals of a database I would recommend learning about B-trees and 3PC algorithms.
Also, I started a newsletter.