Reputation: 1518
I am quite confused about how node discovery is made in Distributed Hash Table algorithms (CHORD). Suppose every node is reachable via multicast. Why would the following scenario be bad:
There is no response, decides it is alone.
Second node arrives.
Adjacent nodes also starts transmitting the necessary key-value pairs to this new arriving node.
A client asks a node for a key, every node knows which ip/port this key corresponds to. Asks that node for the KEY-VALUE pair and returns this to the client.
Now I suppose this scenario is bad because every node could not keep a list of all nodes in a HUGE system. Am I correct?
But then how can a node discover its so called FINGER TABLE?
I know bittorrent keeps a central server as starting node in DHT systems. Is it possible to eliminate this server, if we assume that we can reach every node via multicast?
Thanks in advance. Sorry for multiple questions, but I don't think I could not demonstrate my confusion with a single question.
Upvotes: 0
Views: 758
Reputation: 43052
You seem to be conflating two separate issues:
Since you mention multicast and that would fall under 1. I'll focus on that, 2. is bog-standard once you got the first node and has no multicast-specific properties.
So, if you are in a controlled environment where multicast is available and you want to bootstrap a DHT then you can have all nodes join the same multicast group and occasionally announce their presence. The issue here is that a naive implementation would not scale. Traffic would grow with the number of nodes.
But the solution is quite simple, give each node a randomized, exponential backoff when it sees an advertisement packet. In other words, the amount of advertisement messages in flight should be kept constant, regardless of the number of nodes in the DHT.
It doesn't matter who is sending the advertisement package. A node does not necessarily have to announce itself. It is sufficient to receive a packet to learn about any other node and then commence to populate its routing table based on the received packet, if it needs to.
Nodes should not contact the advertising node during steady state operation, since they already have a populated routing table. They don't need to. And if potentially hundreds or thousands of nodes were to do so they would overwhelm the currently advertising node.
To spread load even further the advertising node could also include a list of random nodes from its routing table into the advertisement so that a currently bootstrapping node could pick one at random.
If you have other questions regarding the discovery or bootstrap-processes that are not specific to multicasting I would recommend creating new questions.
Upvotes: 1