Iraj
Iraj

Reputation: 1522

Improve loop performance with alternative solution

I have two list objects,Products and Prices ,

var products = GetProdusts(predicate);
var prices = GetAllProductPrices();

For fill Product.Prices with prices object :

Parallel.ForEach(products ,prod=> {

   prod.Prices= prices?.Where(pr =>pr.ProductId == prod.Id)?.ToList();
});

This loop take a long time. Can anyone help me how to improve this.

Update:

Base on @Holger comment:

var priceDics = prices.GroupBy(p=>p.ProductId).ToDictionary(p=>p.Key ,p=>p.ToList());

Parallel.ForEach(products ,prod=> {

  prod.Prices= priceDics?.Where(pr =>pr.Key== prod.Id)?.SelectMany(x=>x.Value).ToList();
});

and now performance is very improved.

Upvotes: 0

Views: 360

Answers (1)

Theodor Zoulias
Theodor Zoulias

Reputation: 43399

Here is a simple solution that uses a Lookup. It should be so fast that parallelism can't make it any faster.

var lookup = prices.ToLookup(p => p.ProductId);
foreach (var product in products)
{
    product.Prices = lookup[product.Id]?.ToList();
}

There is no much calculation happening inside the loop. The lookup[...] getter is almost instantaneous, and what is left is allocating memory for the lists, and copying the contents of each Grouping to a new List, which is an highly efficient, one-CPU-instruction, operation.


Update: Performance test with 500,000 products and 11,000,000 prices:

class Price { public int ProductId; }
class Product { public int Id; public List<Price> Prices; }

var products = Enumerable.Range(1, 500_000)
    .Select(n => new Product() { Id = n }).ToList();
var prices = Enumerable.Range(1, 11_000_000)
    .Select(n => new Price { ProductId = n % products.Count }).ToList();

var stopwatch = Stopwatch.StartNew();
var lookup = prices.ToLookup(p => p.ProductId);
Console.WriteLine($"Duration Lookup: {stopwatch.ElapsedMilliseconds:#,0} msec");
foreach (var product in products)
{
    product.Prices = lookup[product.Id]?.ToList();
}
Console.WriteLine($"Duration Total: {stopwatch.ElapsedMilliseconds:#,0} msec");

Output:

Duration Lookup: 4,051 msec
Duration Total: 4,695 msec

The slowest part of the process is populating the Lookup, which is not parallelizable. The final loop of assigning prices to products can be parallelized, but it takes less than a second.

Upvotes: 2

Related Questions