MK.
MK.

Reputation: 4027

algorithm to replicate data between computers

Suppose I have n computers. Each of them has a set of integers. Each computer will not have the same set.

i.e. computer 1 has {1,2,3,4}, computer 2 has {4, 5,10,20,21}, computer 3 has {-10,3,5} and so on.

I want to replicate this data so that all computers will have all the integers , i.e. all of them will have {-10,1,2,3,4,5,10,20,21}

I want to minimize the number of messages that each computer sends and also minimize the time. (i.e. avoid a serial approach where computer 1 first communicates with everyone and gets the data it is missing, then computer 2 does the same and so on.

What is an efficient way of doing this?

Thanks.

Upvotes: 2

Views: 134

Answers (4)

Uours
Uours

Reputation: 2492

Minimal approach would be : All computers send info to just one ( master ) computer and get the result

For reliability you could consider at least two computers as master computers

Assumptions :

  1. Total n computers
  2. One of the computers is considered as master

Algorithm :

  1. All computers send input-info to Master ( total n-1 messages )
  2. Master processes the info
  3. Master sends the result-info to all computers ( total n-1 messages )

Reliability :

Total failure of the system based on this algorithm can only occur if all the masters failed .

Efficiency :

With 1 master  , total messages : 2 * (n-1)
With 2 masters , total messages : 2 * 2 * (n-1)
With 3 masters , total messages : 3 * 2 * (n-1)

Upvotes: 1

Arun Taylor
Arun Taylor

Reputation: 1572

If all the computers are on the same network, you could use UDP sockets with SO_BROADCAST option.

This way when one computer does a message 'send', all the other computers would 'recv' the message and update as necessary.

Upvotes: 1

Will Hartung
Will Hartung

Reputation: 118754

Well, there's already a system that does this, is widely adopted, well documented, widely available and, while it's perhaps not perfect (for assorted definitions of perfect), it's practical.

It's called RSync.

You can start here: http://www.samba.org/~tridge/phd_thesis.pdf

Upvotes: 0

user1952500
user1952500

Reputation: 6771

Here's one way of doing it in 2*n - 2 moves. Model the machines as nodes in a linked-list and numbered from 1..n.

  1. Let node 1 send all of its data in one message to node 2.
  2. Node 2 remembers the message node 1 sent, performs a union of its content with node 1's content and sends the unified message to Node 3. Then Node 2 waits for a response from Node 3.
  3. Node 3 does the same as above and so on until we get to node 'n'. Node 'n' now has the complete set.
  4. Node 'n' already knows what message node 'n - 1' sent it, so it sends the diff back to node 'n - 1'
  5. Node 'n - 1' performs the union as above. Since it has remembered the message of node 'n - 2' (in step 2 above), it can send the diff back to node 'n - 3'

and so on.

I think it is not complex to show that the above leads to 2 * (n - 1) messages being sent in the network.

I think it can be proven that 2n - 2 is necessary by considering each node to have a unique element. It should be a short exercise in mathematical induction to prove that 2n - 2 is necessary..

Upvotes: 0

Related Questions