Reputation: 83
I am noticing a huge performance issue with trying to get a list of keys in a ConcurrentDictionary value object that exist in an IEnumerable collection as follows:
Customer object has: string CustomerNumber; string Location;
var CustomerDict = ConcurrentDictionary<string, Customer>();
var customers = IEnumerable<string>();
I am trying to get a list of the keys in the dictionary where the customers.CustomerNumber
is in the dictionary. What I have is below the removeItems takes a very long time to return:
var removeItems = CustomerDict
.Where(w => customers.Any(c => c == w.Value.CustomerNumber))
.Select(s => s.Key)
.ToList();
foreach(var item in removeItems)
{
CustomerDict.TryRemove(item, out _);
}
Any help would be much appreciated what best to do with this.
Upvotes: 0
Views: 344
Reputation: 7440
Why not just simply do this:
foreach(var customer in customers) //enumerate customers
CustomerDict.TryRemove(customer, out _); //trytoremove the customer, won't do anything if the customer isn't found
Upvotes: 0
Reputation: 831
I think its better to create HashSet
from customers
in order to look faster,
HashSet<string> customersHashSet = new HashSet<string>(customers);
var removeItems = CustomerDict
.Where(c => customersHashSet.Contains(c.Value.CustomerNumber))
.Select(s => s.Key);
foreach (var item in removeItems)
{
CustomerDict.TryRemove(item, out _);
}
When removing consider if you have many items in the HashSet ( relatively to the dictionary ) its maybe better to iterate over the dictionary and search in the HashSet, like this :
foreach (var item in CustomerDict.ToArray())
{
if (customersHashSet.Contains(item.Value.CustomerNumber))
CustomerDict.TryRemove(item.Key, out _);
}
Upvotes: 2
Reputation: 20372
Make customers
a HashSet<string>
, who's Contains
method is O(1)
:
var customers = HashSet<string>();
var removeItems = CustomerDict
.Where(w => customers.Contains(w.Value.CustomerNumber))
.Select(s => s.Key);
Currently, Any
is iterating over customers
every time which has an O(n)
complexity.
Also you're call to ToList
is superfluous: it adds an additional, unnecessary iteration over customers
, not to mention increased memory usage.
Upvotes: 3
Reputation: 5791
The problem is that .Any
will do a linear scan of the underlying collection, which in your case is the key collection of your concurrent dictionary. This takes linear effort. It would be better to dump the keys into a local HashSet and then check the inclusion via .Contains(w.Value.CustomerNumber)
. This becomes nearly constant effort.
Upvotes: 1