Rockstaa
Rockstaa

Reputation: 23

SQL Server : Hashbytes with Umlauts

I have a strange situation figured out, when the Hashbyte function of SQL Server is not outputting the correct result when converting it to SHA2_256 with a string containing umlauts (ä,ö,ü,ß).

I run the example code in SQL Server:

 declare @cryptString varchar(50) 
 set @cryptString = 'test'

 select convert(Varchar(64), Hashbytes('SHA2_256', @cryptstring), 2)

The result is:

9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08

When I'm checking the SHA256 conversion on https://hashgenerator.de/ the result is the same.

My problem: when I'm trying to crypt for example 'müller', the result in SQL Server is:

26A45113433596C5DD53643C7652381202E8009E532A280E513D887174A9ED14

When I'm checking the SHA256 conversion on https://hashgenerator.de/ the result is different.

2dbd218072117713f2d5996a726a9b216ed791ffd0783b6ba4ab6d61b8333192

I think it could be a encoding problem, but I searched for hours and can't find any clue to solve this problem.

I appreciate any kind of help to solve this problem.

Upvotes: 2

Views: 1257

Answers (1)

Joel Coehoorn
Joel Coehoorn

Reputation: 416019

You have this:

declare @cryptString varchar(50) 

And you try to use it to hold this value:

müller

That's bad. You need an nvarchar for anything that could go beyond basic ascii characters.

But that's just for starters. nvarchar uses UTF-16 (see the section headed "Supplementary Characters" about halfway down the page). The web site may be using UTF-32 or (likely) UTF-8 to encode those characters. Either one will use slightly different byte representations, which will produce completely different hash values.

I believe you're seeing UTF-8 at https://hashgenerator.de/, because UTF-8 matches ASCII when only using ASCII characters. With UTF-8, simple values like test would then produce the same result for both the web site and the database.

To fix this, understand the SQL hash will use either ASCII or UTF-16, and so you must change your encoding on whatever other platform you're using to match the database. The simplest option is probably to always use UTF-16 for these values, but you could also choose to stick with varchar on the database and convert your text to ascii before calculating the hash elsewhere (with the understanding that you'll lose some fidelity).

Upvotes: 1

Related Questions