SoftwareSavant
SoftwareSavant

Reputation: 9737

Checking for duplicates in a List of Objects C#

I am looking for a really fast way to check for duplicates in a list of objects.

I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...

Suppose I have an object...

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }
}

And I have a list of those objects

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.

When I use linq some how my GroupBy is throwing an exception...

'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)

Which is telling me that I am missing a library. I am having a hard time figuring out which one though.

Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?

UPDATE: What I came up with

This is the linq query that I came up with after doing quick research...

test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()

I am not certain if this is definitely better than this answer...

var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...

The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....

Dupes:

List<DupeCheckee> test = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};

No dupes...

     List<DupeCheckee> test2 = new List<DupeCheckee>{ 
     new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
     new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}

};

Upvotes: 44

Views: 90357

Answers (8)

LMV
LMV

Reputation: 36

I introduced extension for specific types:

public static class CollectionExtensions
{
    public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
                                                       , Func<TSource, TKey> keySelector)
    {
        return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
    }
}

, usage example in code:

if (items.HasDuplicatesByKey(item => item.Id))
{
    throw new InvalidOperationException($@"Set {nameof(items)} has duplicates.");
}

Upvotes: 0

Bob Vale
Bob Vale

Reputation: 18474

You need to reference System.Linq (e.g. using System.Linq)

then you can do

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any());

This will give you groups with all the duplicates

The test for duplicates would then be

var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).Any();

or even call ToList() or ToArray() to force the calculation of the result and then you can both check for dupes and examine them.

eg..

var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
                   .Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
  foreach (var dupeList in dupes) {
    Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
                      dupList.Key.checkThis, 
                      dupList.Key.checkThat,
                      dupList.Count() - 1));
  }

}

Alternatively

var dupes = dupList.Select((x, i) => new { index = i, value = x})
                   .GroupBy(x => new {x.value.checkThis, x.value.checkThat})
                   .Where(x => x.Skip(1).Any());

Which give you the groups which each item per group stores the original index in a property index and the item in the property value

Upvotes: 72

Maris
Maris

Reputation: 4776

There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:

var hasDuplicatedEntries = ListWithPossibleDuplicates
                                   .GroupBy(YourGroupingExpression)
                                   .Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
   // Do what ever you want in case when list contains duplicates 
}

Upvotes: 20

Isomiddin
Isomiddin

Reputation: 19

If any duplicate occurs throws exception. Dictionary checks keys by itself. this is the easiest way.

try
{
  dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
 //message: list items is not uniqe
}

Upvotes: 0

Calvin Wilkinson
Calvin Wilkinson

Reputation: 261

I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.

string text = "this is some text";

var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);

If you wanted to know how many duplicates there are no matter what the duplicates are, use this.

var totalDupeItems = text.GroupBy(x => x).Count(grp =>  grp.Count() > 1);

So for example, "this is some text" has this...

total of letter t: 3

total of letter i: 2

total of letter s: 3

total of letter e: 2

So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.

If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.

var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());

So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.

Upvotes: 5

Arturo Martinez
Arturo Martinez

Reputation: 2583

For in memory objects I always use the Distinct LINQ method adding a comparer to the solution.

public class dupeCheckee
{
     public string checkThis { get; set; }
     public string checkThat { get; set; }

     dupeCheckee(string val, string val2)
     {
         checkThis = val;
         checkThat = val2;
     }

     public class Comparer : IEqualityComparer<dupeCheckee>
     {
         public bool Equals(dupeCheckee x, dupeCheckee y)
         {
             if (x == null || y == null)
                 return false;

             return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
         }

         public int GetHashCode(dupeCheckee obj)
         {
             if (obj == null)
                 return 0;

             return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
                 (obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
         }
     }
}

Now we can call

List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe... 
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe

var distinct = dupList.Distinct(dupeCheckee.Comparer);

Upvotes: 1

Captain Skyhawk
Captain Skyhawk

Reputation: 3500

I think this is what you're looking for:

List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
                                   .SelectMany(g => g.Skip(1));

Upvotes: 1

MatthewMartin
MatthewMartin

Reputation: 33143

Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?

And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.

Also, you could try using a Dictionary, which will guarantee the key is unique.

Upvotes: 0

Related Questions