Reputation: 6278
Ok, I am designing up a piece of software that will keep one system synced with another. The problem is that the originating system is some legacy DB2 nightmare with me only having read-only access and tables having no timestamping capability whatsoever, meaning no way to detect which rows were changed.
My idea is to just load all the rows (in total I will have about 60000 rows, synced every half hour) calculating their hashes, whilst keeping <ID, hash>
tuples in my integration database. Then change detection becomes a job of comparing hashes and updating records in destination system where hashes mismatch or tuples missing altogether. Forgot to mention that reading source is cheap, updating destination is expensive, its a web service with a lot of background processing, so I would avoid updating everything every time.
Now, my problem, the c# builtin hashcode claims that its unsuitable for this purpose (equal hash does not imply equal object) and crypto hashes seem like a big overkill with 256+ bit hashes. I don't think more than 64bits is needed, that would give me 1 in 1010 chance of collision given perfectly distributed hash and allow fast hash comparison on x64 arch.
So what should I use to generate unique hashes?
Upvotes: 1
Views: 176
Reputation: 21480
Another option; calculate the hash in C# using a function like this;
private readonly System.Security.Cryptography.HashAlgorithm hash = System.Security.Cryptography.SHA1.Create();
public static string CalculateSignature(IEnumerable<object> values)
{
var sb = new StringBuilder();
foreach (var value in values)
{
string valueToHash = value == null ? ">>null<<" : Convert.ToString(value, CultureInfo.InvariantCulture);
sb.Append(valueToHash).Append(char.ConvertFromUtf32(0));
}
var signature = sb.ToString();
var bytesToHash = Encoding.UTF8.GetBytes(signature);
var hashedBytes = hash.ComputeHash(bytesToHash);
signature = Encoding.UTF8.GetString(hashedBytes);
return signature;
}
Edit: Hashing profiling test
To show how fast SHA1 hashing is, here's a quick test. On my dev machine, I get 60,000 hashes in 176ms. MD5 takes 161
var hash = System.Security.Cryptography.MD5.Create();
var stringtoHash = "3490518cvm90wg89puse5gu3tgu3v0afgmvkldfjgmvvvvvsh,9semc9petgucm9234ucv0[vhd,flhgvzemgu904vq2m0";
var sw = System.Diagnostics.Stopwatch.StartNew();
for(var i = 0; i < 60000; i++)
{
var bytesToHash = Encoding.UTF8.GetBytes(stringtoHash);
var hashedBytes = hash.ComputeHash(bytesToHash);
var signature = Encoding.UTF8.GetString(hashedBytes);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Upvotes: 2
Reputation: 21480
In your staging SQL tables, add a 'checksum' column, using SQL's checksum function;
Something like this;
update mysourcetable set check = checksum(id, field1, field2, field3, field4 ...)
Clarification
You mentioned having an integration database; my thought was that you would read the data from DB2 into an interim database, like SQL server, where you're already storing ID/hash pairs. If you copied all the data out of DB2, not just the IDs, then you could calculate the checksum in the integration database.
Upvotes: 0