Lock
Lock

Reputation: 5522

Numeric hashing function in SQL Server?

Is there such thing as a hashing function that produces numbers as its output?

Basically- I need to create a key column in my SQL Server that is deterministic (the result is repeatable) and is based off 3 columns in the database. This column will be used as key for that piece of data that will go into a remote system (and I will use this key to match the data back up when it is created in the foreign system).

For similar things, I have been using an SHA5 hashing algorithm to create my keys, however the data I am dealing with at the moment has to be numeric.

Any ideas? The result has to be repeatable and as such has to be based off the input columns.

Upvotes: 12

Views: 29926

Answers (2)

Khal_Drogo
Khal_Drogo

Reputation: 64

You can use the Hasbytes function to hash your value with the Sha algorithm. Since you are working with numeric values you will want to convert the value you want to Hash into a varchar.

For example: If the value I want to Hash is 12345 then the SQL will look like this

SELECT HASHBYTES('SHA',CONVERT(VARCHAR(50),12345)) 

This will output the hashed value 0x8CB2237D0679CA88DB6464EAC60DA96345513964.

If you want to Hash 3 different values as one value you could do something like this

SELECT
HASHBYTES('SHA',CONVERT(VARCHAR(50),12345) + CONVERT(VARCHAR(50),6789) + CONVERT(VARCHAR(50),101112))

This will output the hash 0x66C30E0B60687DBA43CA43F39A2760BC0BEA4DE8

Just replace the Numbers with the numeric fields you are using.

Upvotes: 0

Vladimir Baranov
Vladimir Baranov

Reputation: 32693

SQL Server has several built-in functions for calculating various hashes.

It is not clear what you mean by "the data I am dealing with at the moment has to be numeric." The hash can be calculated off any kind of source data, the result of the hash function (the number of bits) depends on the chosen hash function.

Technically you can have your key defined as binary(n) with whatever number of bytes you are comfortable with. 4 and 8 bytes (int and bigint) are just special cases.


Here is a list of SQL Server hash functions that I know about.

  1. BINARY_CHECKSUM, returns int.

Returns the binary checksum value computed over a row of a table or over a list of expressions.

It may be the simplest function to use for you, since you can easily specify what columns to include in calculations:

SELECT BINARY_CHECKSUM(Col1, Col2, Col3) FROM MyTable;

The drawbacks of this function are: It returns int, which may result in high chance of collisions. We don't really know what algorithm it implements and this algorithm may be different in different versions of SQL Server. If your remote system needs to calculate the hash as well, then you have to use some well-known standard function, see HASHBYTES below.

  1. CHECKSUM, very similar to BINARY_CHECKSUM. The main difference that I saw in the docs is that CHECKSUM obeys the collation rules, such as case-sensitivity, while BINARY_CHECKSUM always uses binary values of the columns.

For example, the strings "McCavity" and "Mccavity" have different BINARY_CHECKSUM values. In contrast, for a case-insensitive server, CHECKSUM returns the same checksum values for those strings. You should avoid comparison of CHECKSUM values with BINARY_CHECKSUM values.

  1. HASHBYTES. Implements the given hashing algorithm (MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512). Returns varbinary.
SELECT 
    HASHBYTES('SHA2_512', 
        CAST(Col1 AS varbinary(8000)) + 
        CAST(Col2 AS varbinary(8000)) + 
        CAST(Col3 AS varbinary(8000))) 
FROM MyTable;

Upvotes: 15

Related Questions