Saurabh Mehta
Saurabh Mehta

Reputation: 141

Hash Value comparison for Encoded/Special characters not matching between HANA and Azure SQL DB

I am working on comparing Hash Values on SAP Hana Data and Azure SQL DB Data. However, it seems Data with special/encoded characters is returning different hash values on SAP Hana DB and SQL DW, even though the Values are same.

Query on Azure SQL DB:

SELECT HASHBYTES('MD5', '不锈钢平垫M10')
-- OUTPUT: 0xF2E5A4AF781A8D5DE0007A24E172CA47

SELECT HASHBYTES('MD5', 'ABCDEFGHIJK')
-- 0x30A4E38230885E27D1BB3FD0713DFA7D

Query on SAP Hana:

SELECT HASH_MD5(TO_BINARY('不锈钢平垫M10')) FROM DUMMY
-- OUTPUT: 0xB5C6519A1F1506431540CFF5980490B4

SELECT HASH_MD5(TO_BINARY('ABCDEFGHIJK')) FROM DUMMY
-- 0x30A4E38230885E27D1BB3FD0713DFA7D

Same is the case with SHA256 algo as well

Require support in handling the encoded/special characters with Hashing

Upvotes: 0

Views: 699

Answers (1)

Lars Br.
Lars Br.

Reputation: 10396

Conor Cunningham MSFT is spot on! The different hash values are the result of different binary values. Each DBMS stores the string differently.

As documented for SAP HANA, the (N)VARCHAR data type stores unicode data (CESU-8/UTF-16).

All the characters from the input string 不锈钢平垫M10 are from the Basic Multilingual Plane and therefore encoded in the same way as in UTF-8.

Checking the string encoding in UTF-8 yields:

cat input.txt | xxd
00000000: e4b8 8de9 9488 e992 a2e5 b9b3 e59e abef  ................
00000010: bcad efbc 91ef bc90                      ........

Which is the exact same byte sequence that SAP HANA uses to encode the string (spaces inserted for readability):

SELECT  TO_BINARY('不锈钢平垫M10') FROM DUMMY  
0xE4B8 8DE9 9488 E992 A2E5 B9B3 E59E ABEF BCAD EFBC 91EF BC90

To make the hash functions work the same, the same binary input data is required. HANA does not support any other encoding for unicode character data, but SQL Server allows for storing character data in unicode. So, one option is to convert the character data column in SQL Server to the matching unicode encoding. This blog post covers this well.

Upvotes: 2

Related Questions