Reputation: 2638
In SQL Server, is it possible to generate a GUID using a specific piece of data as an input value. For example,
DECLARE @seed1 VARCHAR(10) = 'Test'
DECLARE @seed1 VARCHAR(10) = 'Testing'
SELECT NEWID(@seed1) -- will always return the same output value
SELECT NEWID(@seed2) -- will always return the same output value, and will be different to the example above
I know this completely goes against the point of GUIDs, in that the ID would not be unique. I'm looking for a way to detect duplicate records based on certain criteria (the @seed value).
I've tried generating a VARBINARY
string using the HASHBYTES
function, however joining between tables using VARBINARY
seems extremely slow. I'm hoping to find a similar alternative that is more efficient.
Edit: for more information on why I'm looking to achieve this.
I'm looking for a fast and efficient way of detecting duplicate information that exists on two tables. For example, I have first name, last name & email. When these are concatenated, should can be used to check whether these records eexists in table A and table B.
Simply joining on these fields is possible and provides the correct result, however is quite slow. Therefore, I was hoping to find a way of transforming the data into something such as a GUID, which would make the joins much more efficient.
Upvotes: 0
Views: 2863
Reputation: 1
The simplest method;
DECLARE @seed1 VARCHAR(10) = 'Test'
DECLARE @seed2 VARCHAR(10) = 'Testing'
SELECT CAST(HASHBYTES('SHA2_512',@seed1) as uniqueidentifier)
SELECT CAST(HASHBYTES('SHA2_512',@seed2) as uniqueidentifier)
Upvotes: 0
Reputation: 5274
You should use hashbytes
and not checksum
like this:
SELECT hashbytes('MD5', 'JOHN' + ',' + 'SMITH' + ',' + '[email protected]')
Although it's only a small chance checksum can produce the same number with 2 completely different values, I've had it happen with datasets of around a million. As iamdave noted (thanks!), it's a good idea to throw in some kind delimiter (a comma in my example) so that you don't compare 'JOH' + 'NSMITH'
and 'JOHN' + 'SMITH'
as the same.
Upvotes: 2