Simon Farrow
Simon Farrow

Reputation: 1921

Compare binary files in C#

I want to compare two binary files. One of them is already stored on the server with a pre-calculated CRC32 in the database from when I stored it originally.

I know that if the CRC is different, then the files are definitely different. However, if the CRC is the same, I don't know that the files are. So, I'm looking for a nice efficient way of comparing the two streams: one from the posted file and one from the file system.

I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned.

Upvotes: 34

Views: 41412

Answers (8)

Chizl
Chizl

Reputation: 139

This is how I do it today with no loops. Hope this helps provide an alternative option.

public class FileCompare
{
    public bool IsFileSame(string filePath1, string filePath2) => 
        IsFileSame(new FileInfo(filePath1), new FileInfo(filePath2));

    public bool IsFileSame(FileInfo filePath1, FileInfo filePath2)
    {
        var retVal = false;

        if (filePath1.Exists && 
            filePath2.Exists && 
            filePath1.Length == filePath2.Length)
        {
            using (FileStream inputStream1 = File.OpenRead(filePath1.FullName))
            {
                using (FileStream inputStream2 = File.OpenRead(filePath2.FullName))
                {
                    using (MD5 mD = MD5.Create())
                    {
                        retVal = BitConverter.ToString(mD.ComputeHash(inputStream1))
                            .Equals(BitConverter.ToString(mD.ComputeHash(inputStream2)));
                    }
                }
            }
        }

        return retVal;
    }
}

Upvotes: 0

Łukasz Nojek
Łukasz Nojek

Reputation: 1641

I took the previous answers, and added the logic from the source code of BinaryReader.ReadBytes to get a solution that does not recreate buffer in every loop and does not suffer from unexpected return values from FileStream.Read:

public static bool AreSame(string path1, string path2) {
    int BUFFER_SIZE = 64 * 1024;
    byte[] buffer1 = new byte[BUFFER_SIZE];
    byte[] buffer2 = new byte[BUFFER_SIZE];

    int ReadBytes(FileStream fs, byte[] buffer) {
        int totalBytes = 0;
        int count = buffer.Length;
        while (count > 0) {
            int readBytes = fs.Read(buffer, totalBytes, count);
            if (readBytes == 0)
                break;

            totalBytes += readBytes;
            count -= readBytes;
        }

        return totalBytes;
    }

    using (FileStream fs1 = new FileStream(path1, FileMode.Open, FileAccess.Read, FileShare.Read))
    using (FileStream fs2 = new FileStream(path2, FileMode.Open, FileAccess.Read, FileShare.Read)) {
        while (true) {
            int count1 = ReadBytes(fs1, buffer1);
            int count2 = ReadBytes(fs2, buffer2);

            if (count1 != count2)
                return false;

            if (count1 == 0)
                return true;

            if (count1 == BUFFER_SIZE) {
                if (!buffer1.SequenceEqual(buffer2))
                    return false;
            } else {
                if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
                    return false;
            }
        }
    }
}

Upvotes: 1

Larry
Larry

Reputation: 297

The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested.

BinaryReader ReadBytes calls are guaranteed to return as many bytes as requested unless the end of the stream is reached first.

The following code takes advantage of BinaryReader to do the comparison:

    static private bool FileEquals(string file1, string file2)
    {
        using (FileStream s1 = new FileStream(file1, FileMode.Open, FileAccess.Read, FileShare.Read))
        using (FileStream s2 = new FileStream(file2, FileMode.Open, FileAccess.Read, FileShare.Read))
        using (BinaryReader b1 = new BinaryReader(s1))
        using (BinaryReader b2 = new BinaryReader(s2))
        {
            while (true)
            {
                byte[] data1 = b1.ReadBytes(64 * 1024);
                byte[] data2 = b2.ReadBytes(64 * 1024);
                if (data1.Length != data2.Length)
                    return false;
                if (data1.Length == 0)
                    return true;
                if (!data1.SequenceEqual(data2))
                    return false;
            }
        }
    }

Upvotes: 6

Mehrdad Afshari
Mehrdad Afshari

Reputation: 421970

static bool FileEquals(string fileName1, string fileName2)
{
    // Check the file size and CRC equality here.. if they are equal...    
    using (var file1 = new FileStream(fileName1, FileMode.Open))
        using (var file2 = new FileStream(fileName2, FileMode.Open))
            return FileStreamEquals(file1, file2);
}

static bool FileStreamEquals(Stream stream1, Stream stream2)
{
    const int bufferSize = 2048;
    byte[] buffer1 = new byte[bufferSize]; //buffer size
    byte[] buffer2 = new byte[bufferSize];
    while (true) {
        int count1 = stream1.Read(buffer1, 0, bufferSize);
        int count2 = stream2.Read(buffer2, 0, bufferSize);

        if (count1 != count2)
            return false;

        if (count1 == 0)
            return true;

        // You might replace the following with an efficient "memcmp"
        if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
            return false;
    }
}

Upvotes: 43

JonPen
JonPen

Reputation: 87

This is how I would do it if you didn't want to rely on crc:

    /// <summary>
    /// Binary comparison of two files
    /// </summary>
    /// <param name="fileName1">the file to compare</param>
    /// <param name="fileName2">the other file to compare</param>
    /// <returns>a value indicateing weather the file are identical</returns>
    public static bool CompareFiles(string fileName1, string fileName2)
    {
        FileInfo info1 = new FileInfo(fileName1);
        FileInfo info2 = new FileInfo(fileName2);
        bool same = info1.Length == info2.Length;
        if (same)
        {
            using (FileStream fs1 = info1.OpenRead())
            using (FileStream fs2 = info2.OpenRead())
            using (BufferedStream bs1 = new BufferedStream(fs1))
            using (BufferedStream bs2 = new BufferedStream(fs2))
            {
                for (long i = 0; i < info1.Length; i++)
                {
                    if (bs1.ReadByte() != bs2.ReadByte())
                    {
                        same = false;
                        break;
                    }
                }
            }
        }

        return same;
    }

Upvotes: 9

Lars
Lars

Reputation: 657

I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks. This reduced time to about 1/4.

    private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)
    {
        const int bufferSize = 2048 * 2;
        var buffer1 = new byte[bufferSize];
        var buffer2 = new byte[bufferSize];

        while (true)
        {
            int count1 = stream1.Read(buffer1, 0, bufferSize);
            int count2 = stream2.Read(buffer2, 0, bufferSize);

            if (count1 != count2)
            {
                return false;
            }

            if (count1 == 0)
            {
                return true;
            }

            int iterations = (int)Math.Ceiling((double)count1 / sizeof(Int64));
            for (int i = 0; i < iterations; i++)
            {
                if (BitConverter.ToInt64(buffer1, i * sizeof(Int64)) != BitConverter.ToInt64(buffer2, i * sizeof(Int64)))
                {
                    return false;
                }
            }
        }
    }

Upvotes: 22

Josh
Josh

Reputation: 69242

You can check the length and dates of the two files even before checking the CRC to possibly avoid the CRC check.

But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU. For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. On a 64 bit PC you can read 8 bytes at a time. This is roughly 4 or 8 times as fast as doing it byte by byte. You also would probably wanna use an unsafe code block so that you could use pointers instead of doing a bunch of bit shifting and OR'ing to get the bytes into the native int sizes.

You can use IntPtr.Size to determine the ideal size for the current processor architecture.

Upvotes: 2

albertjan
albertjan

Reputation: 7817

if you change that crc to a sha1 signature the chances of it being different but with the same signature are astronomicly small

Upvotes: 3

Related Questions