Reputation: 3048
I am working on an ETL process (no primary key -> no transactional replication) between Oracle and SQL Server and using a MD5 hash to detect for differences between the source and the destination databases.
This works fine for those records where the data falls into the first 127 ASCII characters. But when there are any 'extended ascii'* characters such as ½
, °
, or ©
SQL Server's HASHBYTES
function hashes these characters in a non-standard way (i.e. different from Oracle's DBMS_CRYPTO.Hash
, the .Net Cryptographic library etc).
So when I run this in Oracle:
select rawtohex(
DBMS_CRYPTO.Hash (
UTL_I18N.STRING_TO_RAW ('°', 'AL32UTF8'),
2)
) from dual;
I get: 4723EB5AA8B0CD28C7E09433839B8FAE
.
And when I run this in SQL Server:
SELECT HASHBYTES('md5', '°');
I get: EC655B6DA8B9264A7C7C5E1A70642FA7
And when I run this C# code:
string password = "°";
// byte array representation of that string
byte[] encodedPassword = new UTF8Encoding().GetBytes(password);
// need MD5 to calculate the hash
byte[] hash = ((HashAlgorithm) CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
// string representation (similar to UNIX format)
string encoded = BitConverter.ToString(hash)
// without dashes
.Replace("-", string.Empty)
// make lowercase
.ToLower();
I get 4723EB5AA8B0CD28C7E09433839B8FAE
i.e. the same as in Oracle and every online tool that I have used.
Is there any SQL-based solution to this problem or would I need to create a CLR stored procedure and hash the data there?
*I realise that the term is somewhat controversial
Upvotes: 2
Views: 1058
Reputation: 3048
I have decided to around SQL Server's treatment of extended ASCII by implementing a CLR stored procedure that use the .Net cryptographic library:
using System;
using System.Security.Cryptography;
using System.Text;
using Microsoft.SqlServer.Server;
public class Functions
{
[SqlFunction]
public static string GetMD5Hash (string input)
{
var encodedPassword = new UTF8Encoding().GetBytes(input);
var hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
return BitConverter.ToString(hash).Replace("-", string.Empty);
}
}
Upvotes: 0
Reputation: 7692
There is no UTF-8 support in MS SQL Server as of now. Because of this, your hashes will always differ until you switch the source string to the most common denominator, in this case UTF-16 (probably).
Upvotes: 3