Reputation: 1981
I'm writing a C# API which stored SWIFT
messages types. I need to write a class that takes the entire string message and create a hash of it, store this hash in the database, so that when a new message is processed, it creates another hash, and checks this hash against ones in the database.
I have the following
public static byte[] GetHash(string inputString)
{
HashAlgorithm algorithm = MD5.Create(); // SHA1.Create()
return algorithm.ComputeHash(Encoding.UTF8.GetBytes(inputString));
}
and I need to know, if this will do?
Global Comment*
So, I receive the files in a secure network, so we have full control over their validity - What I need to control is duplicate payments being made. I could split the record down into it's respective tag elemenents (SWFIT terminology) and then check them individually, but this then need to compare against records in the database, and the cost isn't something that can happen.
I need to check if the entire message is a duplicate of a message already processed, which is why i used this approach.
Upvotes: 2
Views: 2382
Reputation: 9896
It depends on what you want to do. If you are expecting messages to never be intentionally tampered with, even CRC64 will do just fine.
If you want a .NET provided solution that is fast and provides no cryptographic security, MD5 is just fine and will work for what you need.
If you need to determine if a message is different from another, and you expect someone to tamper with the data in transit and it may potentially be modified with bit twiddling techniques to force a hash collision, you should use SHA-256 or SHA-512.
Collisions shouldn't be a problem unless you are hashing billions of messages or someone is tampering with the data in transit. If someone is tampering with the data in transit, you have bigger problems.
Upvotes: 3
Reputation: 10045
You could implement it the way that Dictionary
implements it. The Bucket system.
Have a Hash value in the database, and store the raw data.
----------------
| Hash | Value |
----------------
By searching through the hashes first the query will be faster, and if there are multiple hits, as there at some point will be with MD5
, you can just iterate through them, and match them more closely to see if they really are the same.
But as Michael J. Gray says, the probability of a collision is very small, on smaller datasets.
Upvotes: 3