Reputation: 6261
I am working on a project where I need to synchronize data from our system to an external system. What I want to achieve, is to periodically send only changed items (rows) from a custom query. This query looks like this (but with many more columns) :
SELECT T1.field1,
T1.field2,
T1.field2,
T1.field3,
CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
T2.field1,
T3.field1,
T4.field1
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk
I want to avoid to have to compare every field one to one between synchronizations. I came with the idea that I could generate a hash for every row from my query, and compare this with the hash from the previous synchronization, which will return only the changed rows. I am aware of the CHECKSUM function, but it is very collision-prone and might miss changes sometimes. However I like the way I could just make a temp table and use CHECKSUM(*)
, which makes maintenance easier (not having to add fields in the query and in the CHECKSUM) :
SELECT T1.field1,
T1.field2,
T1.field2,
T1.field3,
CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
T2.field1,
T3.field1,
T4.field1
INTO #tmp
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk;
-- get all columns from the query, plus a hash of the row
SELECT *, CHECKSUM(*)
FROM #tmp;
I am aware of HASHBYTES function (which supports sha1, md5, which are less prone to collisions), but it only accept varchar or varbinary, not a list of columns or * the way CHECKSUM does. Having to cast/convert every column from the query is a pain in the ... and opens the door to errors (forget to include a new field for instance)
I also noticed Change Data Capture and Change Tracking features of SQL Server, but they all seems complicated and overkill for what I am doing.
So my question : is there an other method to generate a hash from a query or a temp table that meets my criterias ?
If not, is there an other way to achieve this kind of work (to sync differences from a query)
Upvotes: 0
Views: 270
Reputation: 6261
I found a way to do exactly what I wanted, thanks to the FOR XML
clause :
SELECT T1.field1,
T1.field2,
T1.field2,
T1.field3,
CASE WHEN T1.field4 = 'some-value' THEN 1 ELSE 0 END,
T2.field1,
T3.field1,
T4.field1
INTO #tmp
FROM T1
INNER JOIN T2 ON T2.pk = T2.fk
INNER JOIN T3 ON T3.pk = T2.fk
INNER JOIN T4 ON T4.pk = T2.fk;
-- get all columns from the query, plus a hash of the row (converted in an hex string)
SELECT T.*, CONVERT(VARCHAR(100), HASHBYTES('sha1', (SELECT T.* FOR XML RAW)), 2) AS sHash
FROM #tmp AS T;
Upvotes: 1