Linq marking duplicates in a list base on multiple fields

Question

A list of Products is imported from another process the agreed rules state that each batch cannot have duplicates in the combined fields 1-Name, 2-Category and 3-Section.
If duplicates found then each list items Status fields is set to -1 where duplicate we found.

The below code is my attempt using Linq it does return the correct results, however it does not feel smooth and could be difficult to maintain. I would like to make it more generic, readable and extensible would this be the appropriate place to add an extension method passing in the columns to verify for duplicates? If so what would this look like?

public class Product
{
    public int BatchId {get;set;}
    public string Name { get; set; }
    public string Category { get; set; }
    public char Section { get; set; }
    public int Status { get; set; }
    public string Location {get;set;}
}

public enum ImportStatus
{
    Duplicate = -1,
    AwaitingProcess = 0
}

void Main()
{
    List productList = new List()
    { 
        new Product(){BatchId =1,Name="Ginger",Category="Fresh", Section=   'B',Status=  0,   Location="Produce"}
        ,new Product(){BatchId=1,Name="turmeric",Category="Fresh",   Section=   'B',Status=  0, Location="Front Counter"}
        ,new Product(){BatchId=1,Name="LEMON",Category="Fruit",   Section=  'B',Status=  0, Location="Name Area"}
        ,new Product(){BatchId=1,Name="LEMON",Category="Fruit",   Section=  'B',Status=  0, Location="Outer Court"}
        ,new Product(){BatchId=1,Name="lettuce",Category="Produce", Section='X',Status=  0, Location="Produce"}
        ,new Product(){BatchId=1,Name="Lettuce",Category="Product", Section='X',Status=  0, Location="Freezer"}
        ,new Product(){BatchId=1,Name="Apple",Category="Fruit",   Section=  'S',Status=  0, Location="Product"}
        ,new Product(){BatchId=1,Name="Apple",Category="Fruit", Section=    'S',Status=  0, Location="Front Counter"}
    };


    List filteredProductList  =productList
    .Where (l => l.Status==0)
    .GroupBy (r=>new {  r.Name,r.Category,r.Section},( grp, tbl)=> new{GROUP=grp,TBL=tbl})
    .Where (r => r.TBL.Count()>1)
    .SelectMany(r=>r.TBL)
    .Select(r=>new Product {BatchId=r.BatchId, Name=r.Name,Category=r.Category,Section= r.Section,Location=r.Location,Status=(int)ImportStatus.Duplicate})
    .Union(productList   
    .Where (l => l.Status==0)
    .GroupBy (r=>new  {  r.Name,r.Category,r.Section},( grp, tbl)=> new{GROUP=grp,TBL=tbl})
    .Where (r => r.TBL.Count()==1)
    .SelectMany(r=>r.TBL)
    .Select (r =>new Product{BatchId=r.BatchId, Name=r.Name,Category=r.Category,Section= r.Section,Location=r.Location,Status=0}) 
    )
    .ToList();

    // Result of FilterProductionList =
    // 1 LEMON Fruit B -1 Name Area 
    // 1 LEMON Fruit B -1 Outer Court 
    // 1 Apple Fruit S -1 Product 
    // 1 Apple Fruit S -1 Front Counter 
    // 1 Ginger Fresh B 0 Produce 
    // 1 turmeric Fresh B 0 Front Counter 
    // 1 lettuce Produce X 0 Produce 
    // 1 Lettuce Product X 0 Freezer
}

Bae · Accepted Answer

I could not follow your logic at first glance. This solution will be better for readability and maintainability.

foreach (Product product in productList) 
{
    int dupCount = productList.Count(p => p.Name == product.Name 
                                     && p.Category == product.Category 
                                     && p.Section == product.Section);
    if (dupCount > 1) 
        product.Status = (int)ImportStatus.Duplicate;
}

Edit: It's not always better to use LINQ expressions, especially when you're calling multiple .Where(), .Select(), and .GroupBy(). Don't always rule out loops when we can leverage them here. In this solution you can avoid wasting any extra space introducing another data structure to hold your new filtered product list. Also, instead of instantiating new Product objects, we can keep the original and just update their Status properties.

Linq marking duplicates in a list base on multiple fields

Answers (2)

Related Questions