Reputation: 941

LINQ: Enumerate through duplicates in List and remove them

I need to remove duplicates, but also log which I am removing. I have two solutions right now, one that can go through each duplicate and one that removes duplicates. I know that removing in-place inside a foreach is dangerous so I am a bit stuck on how to do this as efficient as possible.

What I got right now

var duplicates = ListOfThings
.GroupBy(x => x.ID)
.Where(g => g.Skip(1).Any())
.SelectMany(g => g);

foreach (var duplicate in duplicates)
{
    Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", duplicate.ID);
}


ListOfThings = ListOfThings.GroupBy(x => x.ID).Select(y => y.First()).ToList();

Upvotes: 0

Answers (3)

Dmitrii Bychenko

Reputation: 186708

Well, ToList() will materialize the query, so if you allow side effects (i.e. writing to log) it could be like that:

var cleared = ListOfThings
  .GroupBy(x => x.ID)
  .Select(chunk => {
     // Side effect: writing to log while selecting
     if (chunk.Skip(1).Any()) 
       Log.Append(Logger.Type.Error, "Conflicts with another", "N/A", chunk.Key);
     // if there're duplicates by Id take the 1st one
     return chunk.First();
   })
  .ToList();

Upvotes: 2

jacoblambert

Reputation: 787

You can use a hash set and union it with a list to get unique items; just override the reference comparison. Implementing IEqualityComparer<T> is flexible; if it's just ID that makes two objects unique then ok; but if it's more you can extend it, too.

You can get duplicates with LINQ.

void Main()
{
    //your original class:
    List<Things> originalList = new List<Things> { new Things(5), new Things(3), new Things(5) };
    //i'm doing this in LINQPad; if you're using VS you may need to foreach the object
    Console.WriteLine(originalList);
    //put your duplicates back in a list and log them as you did.
    var duplicateItems = originalList.GroupBy(x => x.ID).Where(x => x.Count() > 1).ToList();//.Select(x => x.GetHashCode());
    Console.WriteLine(duplicateItems);
    //create a custom comparer to compare your list; if you care about more than ID then you can extend this
    var tec = new ThingsEqualityComparer();
    var listThings = new HashSet<Things>(tec);
    listThings.UnionWith(originalList);
    Console.WriteLine(listThings);
}

// Define other methods and classes here
public class Things 
{
    public int ID {get;set;}

    public Things(int id)
    {
        ID = id;
    }
}

public class ThingsEqualityComparer : IEqualityComparer<Things>
{
    public bool Equals(Things thing1, Things thing2)
    {
        if (thing1.ID == thing2.ID)
        {
            return true;
        }
        else
        {
            return false;
        }
    }

    public int GetHashCode(Things thing)
    {
        int hCode = thing.ID;
        return hCode.GetHashCode();
    }
}

Upvotes: 0

ΩmegaMan

Reputation: 31626

Why group when one can use the Aggregate function to determine the duplicates for the report and the result?

Example

var items = new List<string>() { "Alpha", "Alpha", "Beta", "Gamma", "Alpha"};

var duplicatesDictionary = 
     items.Aggregate (new Dictionary<string, int>(),  
                      (results, itm) => 
                                       {
                                         if (results.ContainsKey(itm))
                                            results[itm]++;
                                         else
                                           results.Add(itm, 1);

                                         return results;
                                  });

Here is the result of the above where each insert was counted and reported.

Now extract the duplicates report for any count above 1.

duplicatesDictionary.Where (kvp => kvp.Value > 1)
         .Select (kvp => string.Format("{0} had {1} duplicates", kvp.Key, kvp.Value))

Now the final result is to just extract all the keys.

 duplicatesDictionary.Select (kvp => kvp.Key);

Upvotes: 0

LINQ: Enumerate through duplicates in List and remove them

Answers (3)

Related Questions