Reputation: 1921
I want to compare two binary files. One of them is already stored on the server with a pre-calculated CRC32 in the database from when I stored it originally.
I know that if the CRC is different, then the files are definitely different. However, if the CRC is the same, I don't know that the files are. So, I'm looking for a nice efficient way of comparing the two streams: one from the posted file and one from the file system.
I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned.
Upvotes: 34
Views: 41412
Reputation: 139
This is how I do it today with no loops. Hope this helps provide an alternative option.
public class FileCompare
{
public bool IsFileSame(string filePath1, string filePath2) =>
IsFileSame(new FileInfo(filePath1), new FileInfo(filePath2));
public bool IsFileSame(FileInfo filePath1, FileInfo filePath2)
{
var retVal = false;
if (filePath1.Exists &&
filePath2.Exists &&
filePath1.Length == filePath2.Length)
{
using (FileStream inputStream1 = File.OpenRead(filePath1.FullName))
{
using (FileStream inputStream2 = File.OpenRead(filePath2.FullName))
{
using (MD5 mD = MD5.Create())
{
retVal = BitConverter.ToString(mD.ComputeHash(inputStream1))
.Equals(BitConverter.ToString(mD.ComputeHash(inputStream2)));
}
}
}
}
return retVal;
}
}
Upvotes: 0
Reputation: 1641
I took the previous answers, and added the logic from the source code of BinaryReader.ReadBytes
to get a solution that does not recreate buffer in every loop and does not suffer from unexpected return values from FileStream.Read
:
public static bool AreSame(string path1, string path2) {
int BUFFER_SIZE = 64 * 1024;
byte[] buffer1 = new byte[BUFFER_SIZE];
byte[] buffer2 = new byte[BUFFER_SIZE];
int ReadBytes(FileStream fs, byte[] buffer) {
int totalBytes = 0;
int count = buffer.Length;
while (count > 0) {
int readBytes = fs.Read(buffer, totalBytes, count);
if (readBytes == 0)
break;
totalBytes += readBytes;
count -= readBytes;
}
return totalBytes;
}
using (FileStream fs1 = new FileStream(path1, FileMode.Open, FileAccess.Read, FileShare.Read))
using (FileStream fs2 = new FileStream(path2, FileMode.Open, FileAccess.Read, FileShare.Read)) {
while (true) {
int count1 = ReadBytes(fs1, buffer1);
int count2 = ReadBytes(fs2, buffer2);
if (count1 != count2)
return false;
if (count1 == 0)
return true;
if (count1 == BUFFER_SIZE) {
if (!buffer1.SequenceEqual(buffer2))
return false;
} else {
if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
return false;
}
}
}
}
Upvotes: 1
Reputation: 297
The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested.
BinaryReader ReadBytes calls are guaranteed to return as many bytes as requested unless the end of the stream is reached first.
The following code takes advantage of BinaryReader to do the comparison:
static private bool FileEquals(string file1, string file2)
{
using (FileStream s1 = new FileStream(file1, FileMode.Open, FileAccess.Read, FileShare.Read))
using (FileStream s2 = new FileStream(file2, FileMode.Open, FileAccess.Read, FileShare.Read))
using (BinaryReader b1 = new BinaryReader(s1))
using (BinaryReader b2 = new BinaryReader(s2))
{
while (true)
{
byte[] data1 = b1.ReadBytes(64 * 1024);
byte[] data2 = b2.ReadBytes(64 * 1024);
if (data1.Length != data2.Length)
return false;
if (data1.Length == 0)
return true;
if (!data1.SequenceEqual(data2))
return false;
}
}
}
Upvotes: 6
Reputation: 421970
static bool FileEquals(string fileName1, string fileName2)
{
// Check the file size and CRC equality here.. if they are equal...
using (var file1 = new FileStream(fileName1, FileMode.Open))
using (var file2 = new FileStream(fileName2, FileMode.Open))
return FileStreamEquals(file1, file2);
}
static bool FileStreamEquals(Stream stream1, Stream stream2)
{
const int bufferSize = 2048;
byte[] buffer1 = new byte[bufferSize]; //buffer size
byte[] buffer2 = new byte[bufferSize];
while (true) {
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
return false;
if (count1 == 0)
return true;
// You might replace the following with an efficient "memcmp"
if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
return false;
}
}
Upvotes: 43
Reputation: 87
This is how I would do it if you didn't want to rely on crc:
/// <summary>
/// Binary comparison of two files
/// </summary>
/// <param name="fileName1">the file to compare</param>
/// <param name="fileName2">the other file to compare</param>
/// <returns>a value indicateing weather the file are identical</returns>
public static bool CompareFiles(string fileName1, string fileName2)
{
FileInfo info1 = new FileInfo(fileName1);
FileInfo info2 = new FileInfo(fileName2);
bool same = info1.Length == info2.Length;
if (same)
{
using (FileStream fs1 = info1.OpenRead())
using (FileStream fs2 = info2.OpenRead())
using (BufferedStream bs1 = new BufferedStream(fs1))
using (BufferedStream bs2 = new BufferedStream(fs2))
{
for (long i = 0; i < info1.Length; i++)
{
if (bs1.ReadByte() != bs2.ReadByte())
{
same = false;
break;
}
}
}
}
return same;
}
Upvotes: 9
Reputation: 657
I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks. This reduced time to about 1/4.
private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)
{
const int bufferSize = 2048 * 2;
var buffer1 = new byte[bufferSize];
var buffer2 = new byte[bufferSize];
while (true)
{
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
{
return false;
}
if (count1 == 0)
{
return true;
}
int iterations = (int)Math.Ceiling((double)count1 / sizeof(Int64));
for (int i = 0; i < iterations; i++)
{
if (BitConverter.ToInt64(buffer1, i * sizeof(Int64)) != BitConverter.ToInt64(buffer2, i * sizeof(Int64)))
{
return false;
}
}
}
}
Upvotes: 22
Reputation: 69242
You can check the length and dates of the two files even before checking the CRC to possibly avoid the CRC check.
But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU. For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. On a 64 bit PC you can read 8 bytes at a time. This is roughly 4 or 8 times as fast as doing it byte by byte. You also would probably wanna use an unsafe code block so that you could use pointers instead of doing a bunch of bit shifting and OR'ing to get the bytes into the native int sizes.
You can use IntPtr.Size to determine the ideal size for the current processor architecture.
Upvotes: 2
Reputation: 7817
if you change that crc to a sha1 signature the chances of it being different but with the same signature are astronomicly small
Upvotes: 3