Reputation: 269
I have a class like this
public class Category
{
CategoryID
List<Product> ProductList
}
public class Product
{
CategoryID
}
List<Category>
around 15k row and List <Product>
around 40k row. I'm doing with LINQ like this
CategoryList.ForEach(i =>
{
i.ProductList = ProductList.Where(x => x.CategoryID == i.CategoryID).ToList();
});
I want to know there are any implement better performance for this. Data could increase or decrease in other case
Upvotes: 9
Views: 1689
Reputation: 13676
After testing performance i found out that better solution than one i posted before is this :
static List<Category> FilterList(List<Category> list, List<Product> ProductList)
{
Parallel.For(0, list.Count, i =>
{
list[i].ProductList = new List<Product>();
for (int i2 = 0; i2 < ProductList.Count; i2++)
if (ProductList[i2].CategoryID == list[i].CategoryID) list[i].ProductList.Add(ProductList[i2]);
});
return list;
}
And then list = FilterList(list, products);
For 3,500,000
operations it took only 200ms
Upvotes: 2
Reputation: 1562
You can also use a GroupJoin
instead of multiple where
:
https://msdn.microsoft.com/en-us/library/bb397905.aspx
While Parallel will offer you up to nb core enhancement, you can obtain a linear complexity with GroupJoin.
From @CodesInChaos
If the number of categories and products are on the same scale, you'll get a speedup from quadratic to linear runtime
Upvotes: 4
Reputation: 108880
I think it's silly to use parallelism, when a simple change to the algorithm can speed it up thousandfold. Assuming CategoryID
has a decent implementation of GetHashCode()
a dictionary/lookup has nearly O(1)
lookup times compared to the O(n)
time of scanning through a list.
Two possible approaches:
Assuming categories have unique IDs you can use:
var categoryById = categories.ToDictionary(c => c.CategoryID);
foreach(var product in products)
{
var category = categoryById[product.CategoryID];
category.Products.Add(product);
}
This has runtime O(products.Count + categories.Count)
and uses O(categories.Count)
memory.
If categories don't start out with an empty list, you might need to create it.
var productsByCategory = products.ToLookup(product => product.CategoryID);
foreach(var category in categories)
{
category.Products = products[category.CategoryID].ToList();
}
This has runtime O(products.Count + categories.Count)
and uses O(products.Count)
memory.
Since there are typically more products than categories, this approach will require more memory. On the other hand the lookup might remove the need to embed a list of products in the category object.
Upvotes: 12
Reputation: 495
If CategoryList already exists, you can first groupd the products by CategoryID, and then join the grouped products with the categories.
var query =
from p in ProductList
group p by p.CategoryID
into grouped
join c in CategoryList
on grouped.Key equals c.CategoryID
select new { c, grouped };
Then use a loop to set the ProductList field of the Category objects.
foreach (var item in query)
item.c.ProductList = item.grouped.ToList();
Upvotes: 2
Reputation: 668
You may gain performance when organizing the product list grouped per category such as Dictionary<int, List<Product>>
. You will then only have to pick the category's product list by its key instead of executing a LINQ query over the whole product list and generating new sub-lists.
CategoryList.ForEach(i =>
{
i.ProductList = ProductDict[i.CategoryID];
});
Upvotes: 3
Reputation: 1871
Actually most efficient solution in such case would be creation of CategoryList from ProductList. Something like this:
CategoryList = ProductList
.GroupBy(p=>p.CategoryID)
.Select(
g=>
new Category{
CategoryID=g.Key,
ProductList = g.ToList()
}
);
Upvotes: 2
Reputation: 2296
You can use parallel implementation of LINQ - PLINQ. Usually PLINQ increases the speed of LINQ, because it's using all available cores more efficiently.
CategoryList.AsParallel().ForAll(i =>
{
i.ProductList = ProductList.AsParallel.Where(x => x.CategoryID == i.CategoryID).ToList();
});
Upvotes: 5