broke
broke

Reputation: 8302

Calculate MD5 checksum for a file

I'm using iTextSharp to read the text from a PDF file. However, there are times I cannot extract text, because the PDF file is only containing images. I download the same PDF files everyday, and I want to see if the PDF has been modified. If the text and modification date cannot be obtained, is a MD5 checksum the most reliable way to tell if the file has changed?

If it is, some code samples would be appreciated, because I don't have much experience with cryptography.

Upvotes: 425

Views: 442772

Answers (7)

StudioLE
StudioLE

Reputation: 732

In addition to the methods answered above if you're comparing PDFs you need to amend the creation and modified dates or the hashes won't match.

For PDFs generated with QuestPdf youll need to override the CreationDate and ModifiedDate in the Document Metadata.

public class PdfDocument : IDocument
{
    ...

    DocumentMetadata GetMetadata()
    {
        return new()
        {
            CreationDate = DateTime.MinValue,
            ModifiedDate = DateTime.MinValue,
        };
    }
    
    ...
}

https://www.questpdf.com/concepts/document-metadata.html

Upvotes: -1

Khalil
Khalil

Reputation: 1107

For dynamically-generated PDFs. The creation date and modified dates will always be different.

You have to remove them or set them to a constant value.

Then generate md5 hash to compare hashes.

You can use PDFStamper to remove or update dates.

Upvotes: 1

Romil Kumar Jain
Romil Kumar Jain

Reputation: 20745

I know that I am late to party but performed test before actually implement the solution.

I did perform test against inbuilt MD5 class and also md5sum.exe. In my case inbuilt class took 13 second where md5sum.exe too around 16-18 seconds in every run.

    DateTime current = DateTime.Now;
    string file = @"C:\text.iso";//It's 2.5 Gb file
    string output;
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(file))
        {
            byte[] checksum = md5.ComputeHash(stream);
            output = BitConverter.ToString(checksum).Replace("-", String.Empty).ToLower();
            Console.WriteLine("Total seconds : " + (DateTime.Now - current).TotalSeconds.ToString() + " " + output);
        }
    }

Upvotes: 5

Jon Skeet
Jon Skeet

Reputation: 1499760

It's very simple using System.Security.Cryptography.MD5:

using (var md5 = MD5.Create())
{
    using (var stream = File.OpenRead(filename))
    {
        return md5.ComputeHash(stream);
    }
}

(I believe that actually the MD5 implementation used doesn't need to be disposed, but I'd probably still do so anyway.)

How you compare the results afterwards is up to you; you can convert the byte array to base64 for example, or compare the bytes directly. (Just be aware that arrays don't override Equals. Using base64 is simpler to get right, but slightly less efficient if you're really only interested in comparing the hashes.)

If you need to represent the hash as a string, you could convert it to hex using BitConverter:

static string CalculateMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
        }
    }
}

Upvotes: 964

BoliBerrys
BoliBerrys

Reputation: 863

This is how I do it:

using System.IO;
using System.Security.Cryptography;

public string checkMD5(string filename)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(filename))
        {
            return Encoding.Default.GetString(md5.ComputeHash(stream));
        }
    }
}

Upvotes: 79

Ashley Davis
Ashley Davis

Reputation: 10040

Here is a slightly simpler version that I found. It reads the entire file in one go and only requires a single using directive.

byte[] ComputeHash(string filePath)
{
    using (var md5 = MD5.Create())
    {
        return md5.ComputeHash(File.ReadAllBytes(filePath));
    }
}

Upvotes: 5

Badaro Jr.
Badaro Jr.

Reputation: 304

I know this question was already answered, but this is what I use:

using (FileStream fStream = File.OpenRead(filename)) {
    return GetHash<MD5>(fStream)
}

Where GetHash:

public static String GetHash<T>(Stream stream) where T : HashAlgorithm {
    StringBuilder sb = new StringBuilder();

    MethodInfo create = typeof(T).GetMethod("Create", new Type[] {});
    using (T crypt = (T) create.Invoke(null, null)) {
        byte[] hashBytes = crypt.ComputeHash(stream);
        foreach (byte bt in hashBytes) {
            sb.Append(bt.ToString("x2"));
        }
    }
    return sb.ToString();
}

Probably not the best way, but it can be handy.

Upvotes: 11

Related Questions