Reputation: 4884
I am looking for information on techniques, algorithms, etc. on how to maintain network integrity in a dynamic peer-to-peer network. Both practical implementations, academic papers and anything else in that category are welcome.
Imagine a network that is solely peer-to-peer based where each node is only connect to x other nodes. Without having a grand list of all nodes, each node is responsible for maintaining a connection with the network. Nodes go down and come up dynamically, meaning each node needs to ask it's neighbors (and their neighbors?) for new nodes to connect to, in order to maintain the x number of connections.
Network segmentation (two halves of the network are only connected by one node from each network - if either of those go down, the network splits into two) and how to avoid this and efficient routing (distance metrics, etc.) are my main interests, but anything relating to networks with similar descriptions would be interesting.
I am currently looking at the Chord DHT protocol as it has some similarity to what I'm asking.
Upvotes: 3
Views: 791
Reputation: 37600
For ubiquitous computing various ad-hoc P2P networks have been developed and they'd probably fit your needs. It's been used for example in the army to deploy small capsules each talking to neighbors up to usually some command center. If you don't have a center, it may be related to distributed computing, anyway here are some links:
Upvotes: 2
Reputation: 3609
Just to avoid reinventing the wheel, take a look at the various routing protocols. OSPF might be a good starting point, given your scenario. Of course there are many, many variables that might make it not the best choice for you. Such as:
Upvotes: 1
Reputation: 40193
Use Chord. http://en.wikipedia.org/wiki/Chord_(peer-to-peer)
I've implemented it in projects before and it addresses these problems.
Upvotes: 0
Reputation: 3414
Have you looked at Kademlia? It is similar to Chord, and versions of it are used by BitTorrent and eMule. The paper lists a few measures for ensuring network integrity, even in the face of attack. The two basic ones are
I'm not sure how much of this applies to Chord, because I haven't read as much about it, but I think going with a DHT is a good idea unless you need fuzzy searching.
Upvotes: 0
Reputation: 3812
My thoughts only -- not a complete solution; not tested in practice but still may touch on a number of interesting problems and potential solutions.
Standardised time for node failure and rejoining must be recorded and managed. To achieve this the network does not calculate on real-time basis but on an animation frame number basis. Have N front-end processors assigning FEP ID and job ID and network animation frame number to incoming jobs. There are a number of issues with real-time that are not quite addressed with quantizing time even; in some exception cases, its a bit like in accounting, posting events to when they should be regarded as occuring rather than when any cash moves.
For high performance, the heartbeat packets must also contain details of jobs being performed and recently completed or abandoned as well as the inventory of hosts in the network.
Network proceeds to process work items and publish their results to adjacent peers or FEPs. FEPs forward completed job details to clients, and can take over for failed FEPs as only state in an FEP is the last serial number stamped on a request.
Network must have a quorum to continue. External monitors track connectivity and inform the nodes which experience changes in connectivity whether they are now within or outside the quorum.
Where a work item is not completed by a machine because it fails, or a new node joins the network, a new work allocation policy must be established based on work item ID to allocate the work to the remaining nodes, until the new node comes back online.
For cases where multiple nodes perform the same job (duplication of effort - which is possible but minimised by designing the usual timeouts sensibly) the jobs must be rollbackable, and the conflict resolved using Markov Chains.
To detect the possible duplications reliably jobs must auto-rollback in less time than the timeout for receiving job results that applies during a crisis period ie when nodes are failing. A shorter timeout applies when nodes are not failing.
Upvotes: 2
Reputation: 45081
Netsukuku project aims for creation of protocols and software implementation for large wifi based ad-hoc network(s).
From their FAQ: "The Netsukuku project is based on the very simple idea of exploiting the great potentiality of the wifi connectivity, making the PCs of wireless communities act as routers and handle together an ad-hoc network even bigger than the Internet."
Upvotes: 3