Reputation: 941
I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.
What I got right now
var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);
foreach (var duplicate in duplicates)
{
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}
ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();
Upvotes: 0
Views: 170
Reputation: 186708
Well, ToList()
will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:
var cleared = ListOfThings
.GroupBy(x => x.ID)
.Select(chunk => {
// Side effect: writing to log while selecting
if (chunk.Skip(1).Any())
Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
// if there're duplicates by Id take the 1st one
return chunk.First();
})
.ToList();
Upvotes: 2
Reputation: 787
You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T>
is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.
You can get duplicates with LINQ.
void Main()
{
//your original class:
List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
//i'm doing this in LINQPad; if you're using VS you may need to foreach the object
Console.WriteLine(originalList);
//put your duplicates back in a list and log them as you did.
var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
Console.WriteLine(duplicateItems);
//create a custom comparer to compare your list; if you care about more than ID then you can extend this
var tec = new ThingsEqualityComparer();
var listThings = new HashSet<Things>(tec);
listThings.UnionWith(originalList);
Console.WriteLine(listThings);
}
// Define other methods and classes here
public class Things
{
public int ID {get;set;}
public Things(int id)
{
ID = id;
}
}
public class ThingsEqualityComparer : IEqualityComparer<Things>
{
public bool Equals(Things thing1, Things thing2)
{
if (thing1.ID == thing2.ID)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Things thing)
{
int hCode = thing.ID;
return hCode.GetHashCode();
}
}
Upvotes: 0
Reputation: 31626
Why group when one can use the Aggregate
function to determine the duplicates for the report and the result?
Example
var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};
var duplicatesDictionary =
items.Aggregate (new Dictionary<string, int>(),
(results, itm) =>
{
if (results.ContainsKey(itm))
results[itm]++;
else
results.Add(itm, 1);
return results;
});
Here is the result of the above where each insert was counted and reported.
Now extract the duplicates report for any count above 1.
duplicatesDictionary.Where (kvp => kvp.Value > 1)
.Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))
Now the final result is to just extract all the keys.
duplicatesDictionary.Select (kvp => kvp.Key);
Upvotes: 0