Reputation: 20100
I have to find if two dataset are identical without using primary key, only data.
If it exist, I have to return the primary key otherwise I have to insert a new row and return the new primary key
Instead of comparing each field individually, since I think this could be slow????, I came up with this solution
(I'm going to take an address as an example)
On the SQL Server side, a persisted
computed column
with an index
on it;
-- nice new function in 2012; concat
hashbytes('SHA2_256',
concat(
nchar(0x2460),[Address_Type_Id],
nchar(0x2461),[Street_Name],
nchar(0x2462),[Municipality],
nchar(0x2463),[Postal_Zip_Code],
nchar(0x2464),[Unit],
nchar(0x2465),[Street_Number],
nchar(0x2466),[PO_Box],
nchar(0x2467),[Rural_Route_Number],
nchar(0x2468),[Street_Type_Id],
nchar(0x2469),[Country_Type_Id],
nchar(0x246A),[Country_Division_Type_Id],
nchar(0x246B),[Country_Division_Other],
nchar(0x246C),[Direction_Type_Id]
)
)
on the .NET side;
using(var mySHA256 = System.Security.Cryptography.SHA256Managed.Create())
{
var Value = string.Concat(
(char)0x2460, PartyAddress.AddressTypeId,
(char)0x2461, PartyAddress.StreetName,
(char)0x2462, PartyAddress.Municipality,
(char)0x2463, PartyAddress.PostalZip,
(char)0x2464, PartyAddress.Unit,
(char)0x2465, PartyAddress.StreetNumber,
(char)0x2466, PartyAddress.POBOX,
(char)0x2467, PartyAddress.RuralRouteNumber,
(char)0x2468, PartyAddress.StreetTypeId,
(char)0x2469, PartyAddress.CountryTypeId,
(char)0x246A, PartyAddress.CountryDivisionTypeId,
(char)0x246B, PartyAddress.CountryDivisionOther,
(char)0x246C, PartyAddress.DirectionTypeId);
var hashValue = mySHA256.ComputeHash(Encoding.Unicode.GetBytes(Value));
}
After that I can now compare hashValue
and the computed column
and it does work.
My question is, before I start implementing this on MANY other table, should I look at another, better, solution?
EDIT **************
to add another solution, field by field comparison, since I'm using entity framework would be like this
var addressExist = Tombstone.Addresses
.FirstOrDefault(x =>
x.Address_Type_Id == PartyAddress.AddressTypeId &&
x.Street_Name == PartyAddress.StreetName &&
x.Municipality == PartyAddress.Municipality &&
x.Postal_Zip_Code == PartyAddress.PostalZip &&
x.Unit == PartyAddress.Unit &&
x.Street_Number == PartyAddress.StreetNumber &&
x.PO_Box == PartyAddress.POBOX &&
x.Rural_Route_Number == PartyAddress.RuralRouteNumber &&
x.Street_Type_Id == PartyAddress.StreetTypeId &&
x.Country_Type_Id == PartyAddress.CountryTypeId &&
x.Country_Division_Type_Id == PartyAddress.CountryDivisionTypeId &&
x.Country_Division_Other == PartyAddress.CountryDivisionOther &&
x.Direction_Type_Id == PartyAddress.DirectionTypeId);
which generate this query
SELECT TOP (1) [Extent1].[Address_Id] AS [Address_Id]
FROM [dbo].[Address] AS [Extent1]
WHERE ([Extent1].[Address_Type_Id] = @p__linq__0) AND
(([Extent1].[Street_Name] = @p__linq__1) OR (([Extent1].[Street_Name] IS NULL) AND (@p__linq__1 IS NULL))) AND
(([Extent1].[Municipality] = @p__linq__2) OR (([Extent1].[Municipality] IS NULL) AND (@p__linq__2 IS NULL))) AND
(([Extent1].[Postal_Zip_Code] = @p__linq__3) OR (([Extent1].[Postal_Zip_Code] IS NULL) AND (@p__linq__3 IS NULL))) AND
(([Extent1].[Unit] = @p__linq__4) OR (([Extent1].[Unit] IS NULL) AND (@p__linq__4 IS NULL))) AND
(([Extent1].[Street_Number] = @p__linq__5) OR (([Extent1].[Street_Number] IS NULL) AND (@p__linq__5 IS NULL))) AND
(([Extent1].[PO_Box] = @p__linq__6) OR (([Extent1].[PO_Box] IS NULL) AND (@p__linq__6 IS NULL))) AND
(([Extent1].[Rural_Route_Number] = @p__linq__7) OR (([Extent1].[Rural_Route_Number] IS NULL) AND (@p__linq__7 IS NULL))) AND
(([Extent1].[Street_Type_Id] = @p__linq__8) OR (([Extent1].[Street_Type_Id] IS NULL) AND (@p__linq__8 IS NULL))) AND
(([Extent1].[Country_Type_Id] = @p__linq__9) OR (([Extent1].[Country_Type_Id] IS NULL) AND (@p__linq__9 IS NULL))) AND
(([Extent1].[Country_Division_Type_Id] = @p__linq__10) OR (([Extent1].[Country_Division_Type_Id] IS NULL) AND (@p__linq__10 IS NULL))) AND
(([Extent1].[Country_Division_Other] = @p__linq__11) OR (([Extent1].[Country_Division_Other] IS NULL) AND (@p__linq__11 IS NULL))) AND
(([Extent1].[Direction_Type_Id] = @p__linq__12) OR (([Extent1].[Direction_Type_Id] IS NULL) AND (@p__linq__12 IS NULL)))
Upvotes: 3
Views: 102
Reputation: 76208
If you think comparing each field is going to be slow, consider your alternative solution which comprises of:
So you just made 2n operation to 2 x (2n + 2n + hash(n)). Plus you're creating lots of string each of which is adding to CPU cycles and memory (strings are immutable in .Net).
You can clearly see which one is going to be slower.
Update:
Upvotes: 2