Reputation: 365
I need to create the hash for a folder that contains some files. I've already done this task for each of the files, but I'm searching for a way to create one hash for all files in a folder. Any ideas on how to do that?
(Of course I can create the hash for each file and concatenate it to some big hash but it's not a way I like)
Upvotes: 27
Views: 35985
Reputation: 18932
This hashes all file (relative) paths and contents, and correctly handles file ordering.
And it's quick - like 30ms for a 4MB directory.
using System;
using System.Text;
using System.Security.Cryptography;
using System.IO;
using System.Linq;
...
public static string CreateMd5ForFolder(string path)
{
// assuming you want to include nested folders
var files = Directory.GetFiles(path, "*", SearchOption.AllDirectories)
.OrderBy(p => p).ToList();
MD5 md5 = MD5.Create();
for(int i = 0; i < files.Count; i++)
{
string file = files[i];
// hash path
string relativePath = file.Substring(path.Length + 1);
byte[] pathBytes = Encoding.UTF8.GetBytes(relativePath.ToLower());
md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);
// hash contents
byte[] contentBytes = File.ReadAllBytes(file);
if (i == files.Count - 1)
md5.TransformFinalBlock(contentBytes, 0, contentBytes.Length);
else
md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
}
return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
}
Upvotes: 40
Reputation: 46480
Here's a solution that uses streaming to avoid memory and latency issues.
By default the file paths are included in the hashing, which will factor not only the data in the files, but the file system entries themselves, which avoids hash collisions. This post is tagged security
, so this ought to be important.
Finally, this solution puts you in control of the hashing algorithm, which files get hashed, and in what order.
public static class HashAlgorithmExtensions
{
public static async Task<byte[]> ComputeHashAsync(this HashAlgorithm alg, IEnumerable<FileInfo> files, bool includePaths = true)
{
using (var cs = new CryptoStream(Stream.Null, alg, CryptoStreamMode.Write))
{
foreach (var file in files)
{
if (includePaths)
{
var pathBytes = Encoding.UTF8.GetBytes(file.FullName);
cs.Write(pathBytes, 0, pathBytes.Length);
}
using (var fs = file.OpenRead())
await fs.CopyToAsync(cs);
}
cs.FlushFinalBlock();
}
return alg.Hash;
}
}
An example that hashes all the files in a folder:
async Task<byte[]> HashFolder(DirectoryInfo folder, string searchPattern = "*", SearchOption searchOption = SearchOption.TopDirectoryOnly)
{
using(var alg = MD5.Create())
return await alg.ComputeHashAsync(folder.EnumerateFiles(searchPattern, searchOption));
}
Upvotes: 10
Reputation: 885
Quick and Dirty folder hash that does not go down to suborders or read binary data. It is based on file and sub-folder names.
Public Function GetFolderHash(ByVal sFolder As String) As String
Dim oFiles As List(Of String) = IO.Directory.GetFiles(sFolder).OrderBy(Function(x) x.Count).ToList()
Dim oFolders As List(Of String) = IO.Directory.GetDirectories(sFolder).OrderBy(Function(x) x.Count).ToList()
oFiles.AddRange(oFolders)
If oFiles.Count = 0 Then
Return ""
End If
Dim oDM5 As System.Security.Cryptography.MD5 = System.Security.Cryptography.MD5.Create()
For i As Integer = 0 To oFiles.Count - 1
Dim sFile As String = oFiles(i)
Dim sRelativePath As String = sFile.Substring(sFolder.Length + 1)
Dim oPathBytes As Byte() = System.Text.Encoding.UTF8.GetBytes(sRelativePath.ToLower())
If i = oFiles.Count - 1 Then
oDM5.TransformFinalBlock(oPathBytes, 0, oPathBytes.Length)
Else
oDM5.TransformBlock(oPathBytes, 0, oPathBytes.Length, oPathBytes, 0)
End If
Next
Return BitConverter.ToString(oDM5.Hash).Replace("-", "").ToLower()
End Function
Upvotes: -1
Reputation: 201
Dunc's answer works well; however, it does not handle an empty directory. The code below returns the MD5 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 for a 0 length character stream) for an empty directory.
public static string CreateDirectoryMd5(string srcPath)
{
var filePaths = Directory.GetFiles(srcPath, "*", SearchOption.AllDirectories).OrderBy(p => p).ToArray();
using (var md5 = MD5.Create())
{
foreach (var filePath in filePaths)
{
// hash path
byte[] pathBytes = Encoding.UTF8.GetBytes(filePath);
md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);
// hash contents
byte[] contentBytes = File.ReadAllBytes(filePath);
md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
}
//Handles empty filePaths case
md5.TransformFinalBlock(new byte[0], 0, 0);
return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
}
}
Upvotes: 20
Reputation: 38610
Create tarball of files, hash the tarball.
> tar cf hashes *.abc
> md5sum hashes
Or hash the individual files and pipe output into hash command.
> md5sum *.abc | md5sum
Edit: both approaches above do not sort the files so may return different hash for each invocation, depending upon how the shell expands asterisks.
Upvotes: 7
Reputation: 131142
If you already have hashes for all the files, just sort the hashes alphabetically, concatenate them and hash them again to create an uber hash.
Upvotes: 4
Reputation: 11110
Concatenate filenames and files content in one big string and hash that, or do the hashing in chunks for performance.
Sure you need to take few things into account:
Upvotes: 1