Igor Pistolyaka
Igor Pistolyaka

Reputation: 365

How do you create the hash of a folder in C#?

I need to create the hash for a folder that contains some files. I've already done this task for each of the files, but I'm searching for a way to create one hash for all files in a folder. Any ideas on how to do that?

(Of course I can create the hash for each file and concatenate it to some big hash but it's not a way I like)

Upvotes: 27

Views: 35985

Answers (7)

Dunc
Dunc

Reputation: 18932

This hashes all file (relative) paths and contents, and correctly handles file ordering.

And it's quick - like 30ms for a 4MB directory.

using System;
using System.Text;
using System.Security.Cryptography;
using System.IO;
using System.Linq;

...

public static string CreateMd5ForFolder(string path)
{
    // assuming you want to include nested folders
    var files = Directory.GetFiles(path, "*", SearchOption.AllDirectories)
                         .OrderBy(p => p).ToList();

    MD5 md5 = MD5.Create();

    for(int i = 0; i < files.Count; i++)
    {
        string file = files[i];
        
        // hash path
        string relativePath = file.Substring(path.Length + 1);
        byte[] pathBytes = Encoding.UTF8.GetBytes(relativePath.ToLower());
        md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);
        
        // hash contents
        byte[] contentBytes = File.ReadAllBytes(file);
        if (i == files.Count - 1)
            md5.TransformFinalBlock(contentBytes, 0, contentBytes.Length);
        else
            md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
    }
    
    return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
}

Upvotes: 40

Ronnie Overby
Ronnie Overby

Reputation: 46480

Here's a solution that uses streaming to avoid memory and latency issues.

By default the file paths are included in the hashing, which will factor not only the data in the files, but the file system entries themselves, which avoids hash collisions. This post is tagged security, so this ought to be important.

Finally, this solution puts you in control of the hashing algorithm, which files get hashed, and in what order.

public static class HashAlgorithmExtensions
{
    public static async Task<byte[]> ComputeHashAsync(this HashAlgorithm alg, IEnumerable<FileInfo> files, bool includePaths = true)
    {
        using (var cs = new CryptoStream(Stream.Null, alg, CryptoStreamMode.Write))
        {
            foreach (var file in files)
            {
                if (includePaths)
                {
                    var pathBytes = Encoding.UTF8.GetBytes(file.FullName);
                    cs.Write(pathBytes, 0, pathBytes.Length);
                }

                using (var fs = file.OpenRead())
                    await fs.CopyToAsync(cs);
            }

            cs.FlushFinalBlock();
        }

        return alg.Hash;
    }
}

An example that hashes all the files in a folder:

async Task<byte[]> HashFolder(DirectoryInfo folder, string searchPattern = "*", SearchOption searchOption = SearchOption.TopDirectoryOnly)
{
    using(var alg = MD5.Create())
        return await alg.ComputeHashAsync(folder.EnumerateFiles(searchPattern, searchOption));
}

Upvotes: 10

Igor Krupitsky
Igor Krupitsky

Reputation: 885

Quick and Dirty folder hash that does not go down to suborders or read binary data. It is based on file and sub-folder names.

Public Function GetFolderHash(ByVal sFolder As String) As String
    Dim oFiles As List(Of String) = IO.Directory.GetFiles(sFolder).OrderBy(Function(x) x.Count).ToList()
    Dim oFolders As List(Of String) = IO.Directory.GetDirectories(sFolder).OrderBy(Function(x) x.Count).ToList()
    oFiles.AddRange(oFolders)

    If oFiles.Count = 0 Then
        Return ""
    End If

    Dim oDM5 As System.Security.Cryptography.MD5 = System.Security.Cryptography.MD5.Create()
    For i As Integer = 0 To oFiles.Count - 1
        Dim sFile As String = oFiles(i)
        Dim sRelativePath As String = sFile.Substring(sFolder.Length + 1)
        Dim oPathBytes As Byte() = System.Text.Encoding.UTF8.GetBytes(sRelativePath.ToLower())

        If i = oFiles.Count - 1 Then
            oDM5.TransformFinalBlock(oPathBytes, 0, oPathBytes.Length)
        Else
            oDM5.TransformBlock(oPathBytes, 0, oPathBytes.Length, oPathBytes, 0)
        End If
    Next

    Return BitConverter.ToString(oDM5.Hash).Replace("-", "").ToLower()
End Function

Upvotes: -1

Blake Biggs
Blake Biggs

Reputation: 201

Dunc's answer works well; however, it does not handle an empty directory. The code below returns the MD5 'd41d8cd98f00b204e9800998ecf8427e' (the MD5 for a 0 length character stream) for an empty directory.

public static string CreateDirectoryMd5(string srcPath)
{
    var filePaths = Directory.GetFiles(srcPath, "*", SearchOption.AllDirectories).OrderBy(p => p).ToArray();

    using (var md5 = MD5.Create())
    {
        foreach (var filePath in filePaths)
        {
            // hash path
            byte[] pathBytes = Encoding.UTF8.GetBytes(filePath);
            md5.TransformBlock(pathBytes, 0, pathBytes.Length, pathBytes, 0);

            // hash contents
            byte[] contentBytes = File.ReadAllBytes(filePath);

            md5.TransformBlock(contentBytes, 0, contentBytes.Length, contentBytes, 0);
        }

        //Handles empty filePaths case
        md5.TransformFinalBlock(new byte[0], 0, 0);

        return BitConverter.ToString(md5.Hash).Replace("-", "").ToLower();
    }
}

Upvotes: 20

Paul Ruane
Paul Ruane

Reputation: 38610

Create tarball of files, hash the tarball.

> tar cf hashes *.abc
> md5sum hashes

Or hash the individual files and pipe output into hash command.

> md5sum *.abc | md5sum

Edit: both approaches above do not sort the files so may return different hash for each invocation, depending upon how the shell expands asterisks.

Upvotes: 7

Sam Saffron
Sam Saffron

Reputation: 131142

If you already have hashes for all the files, just sort the hashes alphabetically, concatenate them and hash them again to create an uber hash.

Upvotes: 4

aularon
aularon

Reputation: 11110

Concatenate filenames and files content in one big string and hash that, or do the hashing in chunks for performance.

Sure you need to take few things into account:

  • You need to sort files by name, so you don't get two different hashes in case files order changes.
  • Using this method you only take the filenames and content into account. if the filename doesn't count you may sort by content first then hash, if more attributes (ctime/mtime/hidden/archived..) matters, include them in the to-be-hashed string.

Upvotes: 1

Related Questions