creativecoding
creativecoding

Reputation: 257

If a node of a DHT fails, will the values become unavailable?

I'm reading up about DHTs, but struggle to find information on what the consequences are for DHT values when a node fails.

As far as I understand, without redundancy of data (hash table values) the failure of a single node would simply make the values stored in that node unavailable. But if I wanted to use DHTs as storage for any system, I would like that system to be able to rely on the availability of all storage at any time, right? Maybe data redundancy is outsourced to be an independent problem here, but then this would mean that the aspect of decentralization of a DHT introduces additional points of failures, which seems like a huge downside of DHTs.

So how are values kept accessible, if the node responsible for those values fails?

Upvotes: 3

Views: 372

Answers (1)

the8472
the8472

Reputation: 43052

As far as I understand, without redundancy of data (hash table values) the failure of a single node would simply make the values stored in that node unavailable.

That is tautological. Yes, if you choose no redundancy then there is no redundancy.

But if I wanted to use DHTs as storage for any system, I would like that system to be able to rely on the availability of all storage at any time, right?

That depends on how much availability you actually need. No system is 100% reliable.

And DHTs usually are not used as a storage system. Not for long-lived bulk data anyway. It should be considered a dynamic value lookup system, similar to DNS, but distributed and peer-to-peer.

So how are values kept accessible, if the node responsible for those values fails?

The simplest approach is to publish the data with redundancy, i.e. write it to multiple nodes. Either to the N nodes closest to the target ID or with some other deterministic key derivation that can choose multiple addresses. The responsibility of republishing the data to compensate for churn of storage nodes can also lie with the originator of the data. This keeps the implementation complexity and the security/game-theoretic aspects simple.

Storage nodes themselves could also perform redundancy republishing to ensure that data remains available in the absence of the originating node. The problem with this approach is that it is difficult to secure and incentivize correctly on public networks, especially when there are multiple implementations. In closed environments this is more feasible.

Upvotes: 3

Related Questions