Reputation: 269

Linq select match list from other list performance

I have a class like this

public class Category 
{
   CategoryID
   List<Product> ProductList
}
public class Product 
{
   CategoryID
}

List<Category> around 15k row and List <Product> around 40k row. I'm doing with LINQ like this

CategoryList.ForEach(i =>
{
    i.ProductList = ProductList.Where(x => x.CategoryID == i.CategoryID).ToList();
});

I want to know there are any implement better performance for this. Data could increase or decrease in other case

Upvotes: 9

Answers (7)

Fabjan

Reputation: 13676

After testing performance i found out that better solution than one i posted before is this :

    static List<Category> FilterList(List<Category> list, List<Product> ProductList)
    {
        Parallel.For(0, list.Count, i =>
        {
            list[i].ProductList = new List<Product>();
            for (int i2 = 0; i2 < ProductList.Count; i2++)
                if (ProductList[i2].CategoryID == list[i].CategoryID) list[i].ProductList.Add(ProductList[i2]);
        });            

        return list;
    }

And then list = FilterList(list, products);

For 3,500,000 operations it took only 200ms

Upvotes: 2

tafia

Reputation: 1562

You can also use a GroupJoin instead of multiple where: https://msdn.microsoft.com/en-us/library/bb397905.aspx

While Parallel will offer you up to nb core enhancement, you can obtain a linear complexity with GroupJoin.

From @CodesInChaos

If the number of categories and products are on the same scale, you'll get a speedup from quadratic to linear runtime

Upvotes: 4

CodesInChaos

Reputation: 108880

I think it's silly to use parallelism, when a simple change to the algorithm can speed it up thousandfold. Assuming CategoryID has a decent implementation of GetHashCode() a dictionary/lookup has nearly O(1) lookup times compared to the O(n) time of scanning through a list.

Two possible approaches:

Turn categories into a dictionary

Assuming categories have unique IDs you can use:

var categoryById = categories.ToDictionary(c => c.CategoryID);
foreach(var product in products)
{
    var category = categoryById[product.CategoryID];
    category.Products.Add(product);
}

This has runtime O(products.Count + categories.Count) and uses O(categories.Count) memory.

If categories don't start out with an empty list, you might need to create it.

Turn products into a Lookup

var productsByCategory = products.ToLookup(product => product.CategoryID);
foreach(var category in categories)
{
    category.Products = products[category.CategoryID].ToList();
}

This has runtime O(products.Count + categories.Count) and uses O(products.Count) memory.

Since there are typically more products than categories, this approach will require more memory. On the other hand the lookup might remove the need to embed a list of products in the category object.

Upvotes: 12

Michael Spranger

Reputation: 495

If CategoryList already exists, you can first groupd the products by CategoryID, and then join the grouped products with the categories.

var query =
    from p in ProductList
    group p by p.CategoryID
        into grouped        
    join c in CategoryList
        on grouped.Key equals c.CategoryID
    select new { c, grouped };

Then use a loop to set the ProductList field of the Category objects.

foreach (var item in query)
    item.c.ProductList = item.grouped.ToList();

Upvotes: 2

Patrick

Reputation: 668

You may gain performance when organizing the product list grouped per category such as Dictionary<int, List<Product>>. You will then only have to pick the category's product list by its key instead of executing a LINQ query over the whole product list and generating new sub-lists.

CategoryList.ForEach(i =>
{
    i.ProductList = ProductDict[i.CategoryID];
});

Upvotes: 3

Vitaliy Kalinin

Reputation: 1871

Actually most efficient solution in such case would be creation of CategoryList from ProductList. Something like this:

CategoryList = ProductList
  .GroupBy(p=>p.CategoryID)
  .Select(
    g=>
      new Category{
        CategoryID=g.Key,
        ProductList = g.ToList()
      }
  );

Upvotes: 2

Artem Kulikov

Reputation: 2296

You can use parallel implementation of LINQ - PLINQ. Usually PLINQ increases the speed of LINQ, because it's using all available cores more efficiently.

CategoryList.AsParallel().ForAll(i =>
     {
         i.ProductList = ProductList.AsParallel.Where(x => x.CategoryID == i.CategoryID).ToList();
     });

Upvotes: 5

Linq select match list from other list performance

Answers (7)

Turn categories into a dictionary

Turn products into a Lookup

Related Questions