Tinypond
Tinypond

Reputation: 51

Linq marking duplicates in a list base on multiple fields

A list of Products is imported from another process the agreed rules state that each batch cannot have duplicates in the combined fields 1-Name, 2-Category and 3-Section.
If duplicates found then each list items Status fields is set to -1 where duplicate we found.

The below code is my attempt using Linq it does return the correct results, however it does not feel smooth and could be difficult to maintain. I would like to make it more generic, readable and extensible would this be the appropriate place to add an extension method passing in the columns to verify for duplicates? If so what would this look like?

public class Product
{
    public int BatchId {get;set;}
    public string Name { get; set; }
    public string Category { get; set; }
    public char Section { get; set; }
    public int Status { get; set; }
    public string Location {get;set;}
}

public enum ImportStatus
{
    Duplicate = -1,
    AwaitingProcess = 0
}

void Main()
{
    List<Product> productList = new List<Product>()
    { 
        new Product(){BatchId =1,Name="Ginger",Category="Fresh", Section=   'B',Status=  0,   Location="Produce"}
        ,new Product(){BatchId=1,Name="turmeric",Category="Fresh",   Section=   'B',Status=  0, Location="Front Counter"}
        ,new Product(){BatchId=1,Name="LEMON",Category="Fruit",   Section=  'B',Status=  0, Location="Name Area"}
        ,new Product(){BatchId=1,Name="LEMON",Category="Fruit",   Section=  'B',Status=  0, Location="Outer Court"}
        ,new Product(){BatchId=1,Name="lettuce",Category="Produce", Section='X',Status=  0, Location="Produce"}
        ,new Product(){BatchId=1,Name="Lettuce",Category="Product", Section='X',Status=  0, Location="Freezer"}
        ,new Product(){BatchId=1,Name="Apple",Category="Fruit",   Section=  'S',Status=  0, Location="Product"}
        ,new Product(){BatchId=1,Name="Apple",Category="Fruit", Section=    'S',Status=  0, Location="Front Counter"}
    };


    List<Product> filteredProductList  =productList
    .Where (l => l.Status==0)
    .GroupBy (r=>new {  r.Name,r.Category,r.Section},( grp, tbl)=> new{GROUP=grp,TBL=tbl})
    .Where (r => r.TBL.Count()>1)
    .SelectMany(r=>r.TBL)
    .Select(r=>new Product {BatchId=r.BatchId, Name=r.Name,Category=r.Category,Section= r.Section,Location=r.Location,Status=(int)ImportStatus.Duplicate})
    .Union(productList   
    .Where (l => l.Status==0)
    .GroupBy (r=>new  {  r.Name,r.Category,r.Section},( grp, tbl)=> new{GROUP=grp,TBL=tbl})
    .Where (r => r.TBL.Count()==1)
    .SelectMany(r=>r.TBL)
    .Select (r =>new Product{BatchId=r.BatchId, Name=r.Name,Category=r.Category,Section= r.Section,Location=r.Location,Status=0}) 
    )
    .ToList();

    // Result of FilterProductionList =
    // 1 LEMON Fruit B -1 Name Area 
    // 1 LEMON Fruit B -1 Outer Court 
    // 1 Apple Fruit S -1 Product 
    // 1 Apple Fruit S -1 Front Counter 
    // 1 Ginger Fresh B 0 Produce 
    // 1 turmeric Fresh B 0 Front Counter 
    // 1 lettuce Produce X 0 Produce 
    // 1 Lettuce Product X 0 Freezer
}

Upvotes: 2

Views: 242

Answers (2)

Gert Arnold
Gert Arnold

Reputation: 109185

No need to re-create all products. The most efficient way is to loop through the products having duplicates and set their Status property:

foreach (var product in productList
    .Where (p => p.Status == 0)
    .GroupBy (p => new { p.Name, p.Category, p.Section },
        (grp, tbl) => new { GROUP= grp, TBL= tbl } )
    .Where (g => g.TBL.Count() > 1)
    .SelectMany(g => g.TBL).ToList())
{
    product.Status = -1;
}

Upvotes: 1

Bae
Bae

Reputation: 88

I could not follow your logic at first glance. This solution will be better for readability and maintainability.

foreach (Product product in productList) 
{
    int dupCount = productList.Count(p => p.Name == product.Name 
                                     && p.Category == product.Category 
                                     && p.Section == product.Section);
    if (dupCount > 1) 
        product.Status = (int)ImportStatus.Duplicate;
}

Edit: It's not always better to use LINQ expressions, especially when you're calling multiple .Where(), .Select(), and .GroupBy(). Don't always rule out loops when we can leverage them here. In this solution you can avoid wasting any extra space introducing another data structure to hold your new filtered product list. Also, instead of instantiating new Product objects, we can keep the original and just update their Status properties.

Upvotes: 2

Related Questions