Reputation: 9737
I am looking for a really fast way to check for duplicates in a list of objects.
I was thinking of simply looping through the list and doing a manual comparison that way, but I thought that linq might provide a more elegant solution...
Suppose I have an object...
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
}
And I have a list of those objects
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
I need to find the dupes in that list. When I find it, I need to do some additional logic not necessarily removing them.
When I use linq some how my GroupBy is throwing an exception...
'System.Collections.Generic.List<dupeCheckee>' does not contain a definition for 'GroupBy' and no extension method 'GroupBy' accepting a first argument of type 'System.Collections.Generic.List<dupeCheckee>' could be found (are you missing a using directive or an assembly reference?)
Which is telling me that I am missing a library. I am having a hard time figuring out which one though.
Once I figure that out though, How would I essentially check for those two conditions... IE checkThis and checkThat both occur more than once?
UPDATE: What I came up with
This is the linq query that I came up with after doing quick research...
test.Count != test.Select(c => new { c.checkThat, c.checkThis }).Distinct().Count()
I am not certain if this is definitely better than this answer...
var duplicates = test.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
I know I can put the first statement into an if else clause. I also ran a quick test. The duplicates list gives me back 1 when I was expecting 0 but it did correctly call the fact that I had duplicates in one of the sets that I used...
The other methodology does exactly as I expect it to. Here are the data sets that I use to test this out....
Dupes:
List<DupeCheckee> test = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test0", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};
No dupes...
List<DupeCheckee> test2 = new List<DupeCheckee>{
new DupeCheckee("test0", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test1", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test5", "test6"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test6", "test7"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test7", "test8"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test8", "test5"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test9", "test1"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test2", "test2"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test3", "test3"),//{ checkThis = "test", checkThat = "test1"}
new DupeCheckee("test4", "test4"),//{ checkThis = "test", checkThat = "test1"}
};
Upvotes: 44
Views: 90357
Reputation: 36
I introduced extension for specific types:
public static class CollectionExtensions
{
public static bool HasDuplicatesByKey<TSource, TKey>(this IEnumerable<TSource> source
, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Any(group => group.Skip(1).Any());
}
}
, usage example in code:
if (items.HasDuplicatesByKey(item => item.Id))
{
throw new InvalidOperationException($@"Set {nameof(items)} has duplicates.");
}
Upvotes: 0
Reputation: 18474
You need to reference System.Linq (e.g. using System.Linq
)
then you can do
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any());
This will give you groups with all the duplicates
The test for duplicates would then be
var hasDupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).Any();
or even call ToList()
or ToArray()
to force the calculation of the result and then you can both check for dupes and examine them.
eg..
var dupes = dupList.GroupBy(x => new {x.checkThis, x.checkThat})
.Where(x => x.Skip(1).Any()).ToArray();
if (dupes.Any()) {
foreach (var dupeList in dupes) {
Console.WriteLine(string.Format("checkThis={0},checkThat={1} has {2} duplicates",
dupList.Key.checkThis,
dupList.Key.checkThat,
dupList.Count() - 1));
}
}
Alternatively
var dupes = dupList.Select((x, i) => new { index = i, value = x})
.GroupBy(x => new {x.value.checkThis, x.value.checkThat})
.Where(x => x.Skip(1).Any());
Which give you the groups which each item per group stores the original index in a property index
and the item in the property value
Upvotes: 72
Reputation: 4776
There was huge amount of working solutions, but I think that next solution will be more transparent and easy to understand, then all above:
var hasDuplicatedEntries = ListWithPossibleDuplicates
.GroupBy(YourGroupingExpression)
.Any(e => e.Count() > 1);
if(hasDuplicatedEntries)
{
// Do what ever you want in case when list contains duplicates
}
Upvotes: 20
Reputation: 19
If any duplicate occurs throws exception. Dictionary checks keys by itself. this is the easiest way.
try
{
dupList.ToDictionary(a=>new {a.checkThis,a.checkThat});
}
catch{
//message: list items is not uniqe
}
Upvotes: 0
Reputation: 261
I like using this for knowing when there are any duplicates at all. Lets say you had a string and wanted to know if there was any duplicate letters. This is what I use.
string text = "this is some text";
var hasDupes = text.GroupBy(x => x).Any(grp => grp.Count() > 1);
If you wanted to know how many duplicates there are no matter what the duplicates are, use this.
var totalDupeItems = text.GroupBy(x => x).Count(grp => grp.Count() > 1);
So for example, "this is some text" has this...
total of letter t: 3
total of letter i: 2
total of letter s: 3
total of letter e: 2
So variable totalDupeItems would equal 4. There are 4 different kinds of duplicates.
If you wanted to get the total amount of dupe items no matter what the dupes are, then use this.
var totalDupes = letters.GroupBy(x => x).Where(grp => grp.Count() > 1).Sum(grp => grp.Count());
So the variable totalDupes would be 10. This is the total duplicate items of each dupe type added together.
Upvotes: 5
Reputation: 2583
For in memory objects I always use the Distinct
LINQ method adding a comparer to the solution.
public class dupeCheckee
{
public string checkThis { get; set; }
public string checkThat { get; set; }
dupeCheckee(string val, string val2)
{
checkThis = val;
checkThat = val2;
}
public class Comparer : IEqualityComparer<dupeCheckee>
{
public bool Equals(dupeCheckee x, dupeCheckee y)
{
if (x == null || y == null)
return false;
return x.CheckThis == y.CheckThis && x.CheckThat == y.CheckThat;
}
public int GetHashCode(dupeCheckee obj)
{
if (obj == null)
return 0;
return (obj.CheckThis == null ? 0 : obj.CheckThis.GetHashCode()) ^
(obj.CheckThat == null ? 0 : obj.CheckThat.GetHashCode());
}
}
}
Now we can call
List<dupeCheckee> dupList = new List<dupeCheckee>();
dupList.Add(new dupeCheckee("test1", "value1"));
dupList.Add(new dupeCheckee("test2", "value1"));
dupList.Add(new dupeCheckee("test3", "value1"));
dupList.Add(new dupeCheckee("test1", "value1"));//dupe
dupList.Add(new dupeCheckee("test2", "value1"));//dupe...
dupList.Add(new dupeCheckee("test4", "value1"));
dupList.Add(new dupeCheckee("test5", "value1"));
dupList.Add(new dupeCheckee("test1", "value2"));//not dupe
var distinct = dupList.Distinct(dupeCheckee.Comparer);
Upvotes: 1
Reputation: 3500
I think this is what you're looking for:
List<dupeChecke> duplicates = dupeList.GroupBy(x => x)
.SelectMany(g => g.Skip(1));
Upvotes: 1
Reputation: 33143
Do a select distinct with linq, e.g. How can I do SELECT UNIQUE with LINQ?
And then compare counts of the distinct results with the non-distinct results. That will give you a boolean saying if the list has doubles.
Also, you could try using a Dictionary, which will guarantee the key is unique.
Upvotes: 0