The DHT
Distributed Hash Tables provide a decentralized way to store and retrieve data across a network of peers.
What is a DHT?
A Distributed Hash Table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key. The responsibility for maintaining the mapping from keys to values is distributed among the nodes, so that a change in the set of participants causes a minimal amount of disruption.
In libp2p, the DHT serves two critical functions:
- Peer Routing: Finding the network addresses of peers by their Peer ID
- Content Routing: Finding peers that have specific content (provider records)
Kademlia DHT
libp2p uses a variant of the Kademlia DHT protocol, originally designed by Petar Maymounkov and David Mazières. Kademlia is well-suited for peer-to-peer networks because it minimizes the number of configuration messages nodes must send to learn about each other.
XOR Distance Metric
Kademlia uses an XOR-based distance metric to determine how "close" two nodes or keys are to each other. The distance between two identifiers is calculated as:
distance(A, B) = A XOR B
This distance metric has several important properties:
- Identity:
distance(A, A) = 0— a node is zero distance from itself - Symmetry:
distance(A, B) = distance(B, A)— distance is the same in both directions - Triangle Inequality:
distance(A, B) + distance(B, C) >= distance(A, C)
The XOR metric creates a well-defined topology where each node has a consistent view of the network, and lookups converge predictably toward the target.
Routing Table and K-Buckets
Each node in the Kademlia DHT maintains a routing table organized into "k-buckets." The routing table is structured based on the XOR distance from the node's own ID:
- Bucket 0 contains nodes that differ in the most significant bit
- Bucket 1 contains nodes that match in the first bit but differ in the second
- Bucket n contains nodes that match in the first n bits but differ in bit n+1
Each k-bucket can hold up to k entries (typically k=20 in libp2p). This structure ensures that:
- Nodes know more about peers that are "closer" to them in XOR space
- Nodes have at least some knowledge of every region of the ID space
- Lookups require at most O(log n) hops to find any node in a network of n nodes
The Lookup Process
When a node wants to find the closest nodes to a particular key (for peer routing or content routing), it performs an iterative lookup:
- The node selects the α (alpha, typically 3) closest nodes to the target from its own routing table
- It sends parallel
FIND_NODErequests to these nodes - Each contacted node responds with the k closest nodes it knows to the target
- The querying node updates its list of closest nodes and repeats the process
- The lookup terminates when the closest k nodes have been queried and no closer nodes are returned
This process converges quickly because each step typically halves the distance to the target.
DHT Operations
The libp2p Kademlia DHT supports five primary operations:
FIND_NODE
Locates the k closest peers to a given key. This is the fundamental building block for all DHT operations.
Request: FIND_NODE(key)
Response: List of k closest peers to keyPUT_VALUE / GET_VALUE
Store and retrieve arbitrary values in the DHT. Values are stored at the k nodes closest to the key.
PUT_VALUE(key, value) → Stores value at nodes closest to key
GET_VALUE(key) → Retrieves value from nodes closest to key
ADD_PROVIDER / GET_PROVIDERS
These operations are specific to content routing. Instead of storing the content itself, nodes advertise that they can provide content for a given key.
ADD_PROVIDER(key) → Advertise this node as a provider for key
GET_PROVIDERS(key) → Find nodes that provide content for key
Provider records are used extensively in systems like IPFS, where nodes advertise which content blocks they have available.
Peer Routing vs Content Routing
The DHT serves two distinct but related purposes:
Peer Routing
When you have a Peer ID and need to find how to connect to that peer:
- Compute the DHT key from the Peer ID
- Perform a
FIND_NODElookup for that key - The lookup will either find the peer directly or find nodes close enough to have the peer's address in their routing table
Content Routing
When you want to find peers that have specific content:
- Compute the DHT key from the content identifier (e.g., CID in IPFS)
- Perform a
GET_PROVIDERSlookup for that key - Receive a list of peers that have advertised as providers for that content
- Connect to one or more providers to retrieve the content
Bootstrap Process
When a new node joins the network, it needs to populate its routing table:
- The node starts with a list of "bootstrap" nodes — well-known, stable nodes in the network
- It connects to bootstrap nodes and performs a
FIND_NODElookup for its own ID - This lookup populates the routing table with nodes across the ID space
- The node may perform additional random lookups to further diversify its routing table
Client vs Server Mode
libp2p DHT nodes can operate in two modes:
Server Mode
Server mode nodes:
- Respond to DHT queries from other nodes
- Store records (values and provider records) for keys they are close to
- Are included in other nodes' routing tables
- Require a publicly reachable address
Client Mode
Client mode nodes:
- Can query the DHT but do not respond to queries
- Do not store records for other nodes
- Are not included in other nodes' routing tables
- Suitable for nodes behind NAT or with limited resources
Security Considerations
The DHT is subject to several potential attacks:
- Sybil Attacks: An attacker creates many fake identities to gain disproportionate influence
- Eclipse Attacks: An attacker surrounds a target node with malicious nodes to control its view of the network
- Content Poisoning: Malicious nodes return incorrect data for queries
libp2p mitigates these through:
- Signed records with public key validation
- Query responses validated against the original key
- Disjoint lookup paths in some implementations
For more details, see the Security Considerations guide.
Implementation Details
Different libp2p implementations may have varying DHT parameters:
| Parameter | Description | Typical Value |
|---|---|---|
| k | Bucket size (max nodes per bucket) | 20 |
| α (alpha) | Parallelism factor for lookups | 3 |
| Record TTL | How long records are stored | 24-48 hours |
| Refresh interval | How often buckets are refreshed | 1 hour |
Further Reading
- Kademlia DHT guide — Detailed Kademlia implementation specifics
- libp2p DHT specification — Technical specification
- Original Kademlia paper — Academic paper by Maymounkov and Mazières