bashan
bashan

Reputation: 3602

How can I use hashbytes like checksum?

In SQL server, we can easily generate a hash on an entire row using "checksum":

select 
checksum(*), *    
from mytable

Is there a way to do the same thing using the hashbytes function? Something like:

select 
hashbytes('md5', *), *    
from mytable

Upvotes: 2

Views: 2948

Answers (2)

Jeroen Mostert
Jeroen Mostert

Reputation: 28789

If writing out the columns is no objection, you can simply concatenate the columns and hash that:

SELECT HASHBYTES('md5', 
    ISNULL(CONVERT(VARBINARY(MAX), column1), 0x) +
    ISNULL(CONVERT(VARBINARY(MAX), column2), 0x) + 
    ...
)

From SQL Server 2012 onwards, we can achieve the same more compactly using CONCAT (thanks to @MWillemse for the tip):

SELECT HASHBYTES('md5', CONCAT(column1, column2, ...))

Be aware that this can cause collisions if your columns are "the same": hashing AA,BB,CC yields the same result as AAB,BBC,C or even AABBCC,NULL,NULL. If this is a concern, you'll have to design your own hashing scheme to minimize the odds of this (by including the column names in the hash, for example). Also, this is a binary checksum: hashing A,B,C is not the same as a,b,c. Again, if this is a concern you'll have to tweak the data before hashing it (UPPER).

Finally, MD5 is no longer considered cryptographically secure, so only use this for things like checksum indexes where you need less collision probability than with CHECKSUM. If you need a cryptographic signature of a row, this is too simplistic.

Upvotes: 1

MWillemse
MWillemse

Reputation: 980

Not easily I'm afraid. And in most cases you shouldn't really want to. But if you must, generating a XML document for each row and hashing the XML document is probably the easiest way to go:

SELECT  HASHBYTES('MD5', X.value)
FROM    Data
CROSS APPLY (
    SELECT value = 
        CAST((SELECT Data.* FROM   (SELECT 1) AS X (y) FOR XML AUTO) AS VARCHAR(MAX))
) AS X

EDIT: I've did some testing, applying the CAST(... AS VARCHAR(MAX)) as I've done in the updated statement above improves performance about 15%. More importantly, it prevents truncation errors on wide tables. Concatenating the columns yourself is even after this fix still about 15% faster. Using CONCAT() (available from MSSQL 2012) function makes your live a little easier though as you can omit applying ISNULL and CAST functions to each column.

Upvotes: 2

Related Questions