Reputation: 79
I want to figure out if given a set of files, if there happened to be a change in any of those files.
I know for a single file you can use this approach which gets a checksum value that you can use to check if a change happened. I.e. This returns same value for a given file until something is changed in that file then it'll generate a different hash:
byte[] hashBytes;
using(var inputFileStream = File.Open(filePath))
{
var md5 = MD5.Create();
hashBytes = md5.ComputeHash(inputFileStream);
}
string s = Convert.ToBase64String(hashBytes);
Is there a way to get a collection of hash values and get a hash from that collection?
List<byte[]> hashCollection = SomeFunctionThatReturnsListByteArray();
//some approach that can create a hash of this
My main goal is to detect if a change happened. I don't care which file changed.
Upvotes: 1
Views: 1104
Reputation: 1
I'm looking for this issue too. One of my solution is you can zip all the files into the zip file and then get the checksum of the zip file.
Upvotes: 0
Reputation: 81493
Hashing hashes is not optimal. However, if you didn't want to hash all the files together, you could easily just add your hashes to a memory stream and hash that.
Disregarding any other problem conceptual or otherwise.
public static byte[] Hash(IEnumerable<byte[]> source)
{
using var hash = SHA256.Create();
var ms = new MemoryStream();
foreach (var bytes in source)
ms.Write(bytes, 0, bytes.Length);
ms.Seek(0, SeekOrigin.Begin);
return hash.ComputeHash(ms);
}
Note : I am not professing this is the best solution, it's just a solution to your immediate problem
A slightly less allocatey approach
public static byte[] Hash(IList<byte[]> source)
{
using var hash = SHA256.Create();
var ms = new MemoryStream(source.Sum(x =>x.Length));
foreach (var bytes in source)
ms.Write(bytes, 0, bytes.Length);
ms.Seek(0, SeekOrigin.Begin);
return hash.ComputeHash(ms);
}
For a multi file hash (untested)
public static byte[] Hash(IEnumerable<string> source)
{
using var hash = SHA256.Create();
hash.Initialize();
// adjust to what is fastest for you, for hdd 4k to 10k might be appropriate.
// for ssd larger will likely help
// probably best to keep it under 80k so it doesn't end up on LOH (up to you)
const int bufferSize = 1024 * 50;
var buffer = new byte[bufferSize];
foreach (var file in source)
{
using var fs = new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Delete, bufferSize, FileOptions.SequentialScan);
var bytesRead = 0;
while ((bytesRead = fs.Read(buffer, 0, bufferSize)) != 0)
hash.TransformBlock(buffer, 0, bytesRead, buffer, 0);
hash.TransformFinalBlock(buffer, 0, 0);
}
return hash.Hash;
}
Upvotes: 2