user2208746
user2208746

Reputation: 83

What is the fastest way to compare these two collections?

I am noticing a huge performance issue with trying to get a list of keys in a ConcurrentDictionary value object that exist in an IEnumerable collection as follows:

Customer object has: string CustomerNumber; string Location;

var CustomerDict = ConcurrentDictionary<string, Customer>();
var customers = IEnumerable<string>();

I am trying to get a list of the keys in the dictionary where the customers.CustomerNumber is in the dictionary. What I have is below the removeItems takes a very long time to return:

var removeItems = CustomerDict
    .Where(w => customers.Any(c => c == w.Value.CustomerNumber))
    .Select(s => s.Key)
    .ToList();

foreach(var item in removeItems)
{
   CustomerDict.TryRemove(item, out _);
}

Any help would be much appreciated what best to do with this.

Upvotes: 0

Views: 344

Answers (4)

Rand Random
Rand Random

Reputation: 7440

Why not just simply do this:

foreach(var customer in customers) //enumerate customers
   CustomerDict.TryRemove(customer, out _); //trytoremove the customer, won't do anything if the customer isn't found

Upvotes: 0

AnGG
AnGG

Reputation: 831

I think its better to create HashSet from customers in order to look faster,

HashSet<string> customersHashSet = new HashSet<string>(customers);

var removeItems = CustomerDict
                    .Where(c => customersHashSet.Contains(c.Value.CustomerNumber))
                    .Select(s => s.Key);

foreach (var item in removeItems)
{
    CustomerDict.TryRemove(item, out _);
}

When removing consider if you have many items in the HashSet ( relatively to the dictionary ) its maybe better to iterate over the dictionary and search in the HashSet, like this :

foreach (var item in CustomerDict.ToArray())
{
    if (customersHashSet.Contains(item.Value.CustomerNumber))
        CustomerDict.TryRemove(item.Key, out _);
}

Upvotes: 2

Johnathan Barclay
Johnathan Barclay

Reputation: 20372

Make customers a HashSet<string>, who's Contains method is O(1):

var customers = HashSet<string>();

var removeItems = CustomerDict
    .Where(w => customers.Contains(w.Value.CustomerNumber))
    .Select(s => s.Key);

Currently, Any is iterating over customers every time which has an O(n) complexity.

Also you're call to ToList is superfluous: it adds an additional, unnecessary iteration over customers, not to mention increased memory usage.

Upvotes: 3

Georg
Georg

Reputation: 5791

The problem is that .Any will do a linear scan of the underlying collection, which in your case is the key collection of your concurrent dictionary. This takes linear effort. It would be better to dump the keys into a local HashSet and then check the inclusion via .Contains(w.Value.CustomerNumber). This becomes nearly constant effort.

Upvotes: 1

Related Questions