q0987
q0987

Reputation: 35982

Good Algorithms for real time data consistence check

Given multiple servers (more than one), each stores the following information:

key1 => value1
key2 => value2
key1 => newValue1
key3 => value3
...
key4 => value4
...
key3 => newValue3
...
keyN => valueN

The key-value pair received by each server is coming in sequence in real time. We would like to design a monitor program to automatically check the data consistence across different servers.

Proposal 1> The simplest idea is to build a hashtable for each (key-value) pair on the server. However the size of the hashtable is very large, it will be extremely slow if we have to compare the full table every minute/second. Each server receives the data sequence with some network latency so we have to constantly check the consistency across multiple server.

Proposal 2> If we don't care which key-value is inconsistent then we can generate an unique hash number based on each (key-value) pair and compare the computed hash number across different server. However, this method cannot help identify which key-value pair is missing or mismatched among servers.

Question> This question should be very common and we expect there is a pre-existent algorithm that can help us solve the problems effectively. Any suggestion will be welcome.

Thank you

Upvotes: 3

Views: 54

Answers (1)

amit
amit

Reputation: 178471

Might be looking for a Merkle Tree.

In a Merkle Tree, every leaf is a key-value pair, and its hash value.
Each internal node is some combination of the value of its children (hash of their values is common practice).

This allows you (with high probability):

  1. Fast (O(1)) check if two servers have the same key-values stored.
  2. Logarithmic time to find each mismatch (added/changed key-value pair).
  3. logarithmic time insertion/deletion/modification for each entry in each server.

Upvotes: 2

Related Questions