Reputation: 1
I'm comparing two datasets in SQL Server (tables of the same schema) using row hashing (for example, using CheckSum() or HashBytes()). At this point, I can tell which records are identical and which are different. Given different records, I am looking for a way to quantify these differences. for example, consider the two simplified tables below: table1: row11: 0, 0, 0 --> hash1 = 0x0000
table2: row21: 0, 0, 1 --> hash2 = 0x0001
table3: row31: 1, 1, 1 --> hash3 = x
The inequality of row11, row21, row31 is apparent in the fact that: hash1 <> hash2 <> hash3.
the question is, how do I associate a magnitude of this difference with the value of the hashes? In other words, how can I tell, just from the hash value, that the pair (row11, row21) is "more similar" than the pair (row11, row31)?
Upvotes: 0
Views: 7