Reputation: 221
Let's say I have a folder with five hundred pictures in it, and I want to check for repeats and delete them.
Here's the code I have right now:
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return md5.ComputeHash(stream);
}
}
Would this be viable to spot repeated MD5s in a specific folder, provided I loop it accordingly?
Upvotes: 0
Views: 455
Reputation: 3245
Creating hashes in order to identify identical files is OK, in any programming language, on any OS. It is slow, though, because you read the whole file even if that is not necessary.
I would recommend several passes for finding duplicates:
There is a risk of hash collisions. You cannot avoid it with hash algorithms. As MD5 uses 128 bits, the risk is 1 : (1 << 128) (roughly 0.0000000000000000000000000000000000000001) for two random files. Your chances of getting the jackpot in your national lottery four times in a row, using only one lottery ticket each week, are much better than getting a hash collision on a random pair of files.
Though the probability of a hash collision raises somewhat, if you compare the hash of many files. The mathematically interested and people implementing hash containers should look up the "birthday problem". Mere mortals trust MD5 hashes when they are not implementing cryptographic algorithms.
Upvotes: 2
Reputation: 119
using System;
using System.IO;
using System.Collections.Generic;
internal static class FileComparer
{
public static void Compare(string directoryPath)
{
if(!Directory.Exists(directoryPath))
{
return;
}
FileComparer.Compare(new DirectoryInfo(directoryPath));
}
private static void Compare(DirectoryInfo info)
{
List<FileInfo> files = new List<FileInfo>(info.EnumerateFiles());
foreach(FileInfo file in files)
{
if(file.Exists)
{
byte[] array = File.ReadAllBytes(file.FullName);
foreach(FileInfo file2 in files)
{
int length = array.Length;
byte[] array2 = File.ReadAllBytes(file2.FullName);
if(array2.Length == length)
{
bool flag = true;
for(int current = 0; current < length; current++)
{
if(array[current] != array2[current])
{
flag = false;
break;
}
}
if(flag)
{
file2.Delete();
}
}
}
}
}
}
}
Upvotes: 1