The_Holy_One
The_Holy_One

Reputation: 331

C# FileInfo - Find duplicate Files

I have a FileInfo array with ~200.000 File Entries. I need to find all files which have the same filename. I need as result from every duplicate file the directory name and filename because I want to rename them afterwards.

What I've tried already:

Upvotes: 4

Views: 8590

Answers (2)

sga101
sga101

Reputation: 1904

This should work:

HashSet<string> fileNamesSet = new HashSet<string>();
List<string> duplicates = new List<string>();

foreach(string fileName in fileNames)
{
    if(fileNamesSet.Contains(fileName))
    {
        duplicates.Add(fileName);
    }
    else
    {
        fileNamesSet.Add(fileName);
    }
}

Then duplicates will contain a list of all the duplicate filenames.

Note that since windows file names are case insensitive, you may wish to take this into account by converting all of the filenames to uppercase first using .ToUpperInvariant()

Upvotes: 2

Jon Skeet
Jon Skeet

Reputation: 1500225

Sounds like this should do it:

var duplicateNames = files.GroupBy(file => file.Name)
                          .Where(group => group.Count() > 1)
                          .Select(group => group.Key);

Now would be a very good time to learn LINQ. It's incredibly useful - time spent learning it (even just LINQ to Objects) will pay itself back really quickly.

EDIT: Okay, if you want the original FileInfo for each group, just drop the select:

var duplicateGroups = files.GroupBy(file => file.Name)
                           .Where(group => group.Count() > 1);

// Replace with what you want to do
foreach (var group in duplicateGroups)
{
     Console.WriteLine("Files with name {0}", group.Key);
     foreach (var file in group)
     {
         Console.WriteLine("  {0}", file.FullName);
     }
}

Upvotes: 11

Related Questions