Martin Verjans
Martin Verjans

Reputation: 4796

Select Items based on a list containing the IDs

I have a list of Items, each containing a field of Type integer.

I want to filter my list to get only the items that match a given list of integers.

The code I have now works but I know it could be optimized.

Class Item
{
    int ID;

    //Other fields & methods that are irrelevant here
}

//Selection method
IEnumerable<Item> SelectItems(List<Item> allItems, List<int> toSelect)
{
     return allItems.Where(x => toSelect.Contains(x.ID));
}

The problem I have is that I iterate through allItems and in each iteration I iterate through toSelect.

I have the feeling it is possible to be much more effective but I don't know how I can achieve this with Linq.

This might also be an already asked question as I don't know how this is called in English. This feels kind of stupid because I don't know how to formulate this properly in a seach engine.

Upvotes: 1

Views: 376

Answers (2)

Tim Schmelter
Tim Schmelter

Reputation: 460058

You can use Join which is more efficient because it's using a set based approach:

var selectedItems = from item in allItems
                    join id in toSelect
                    on item.Id equals id
                    select item;
return selectedItems;

Another way which is more efficient is to use a HashSet<int> instead of a list:

IEnumerable<Item> SelectItems(List<Item> allItems, HashSet<int> toSelect)
{
     return allItems.Where(x => toSelect.Contains(x.ID));
}

Upvotes: 5

Marc Gravell
Marc Gravell

Reputation: 1062600

There are two ways to approach this.

Currently you have O(N×M) performance (where N is the size of allItems and M is the size of toSelect).

If you're just trying to reduce it easily, then you could reduce it to O(N)+O(M) by creating a hash-set of toSelect:

var matches = new HashSet<int>(toSelect);
return allItems.Where(x => matches.Contains(x.ID));

However, this is still going to be dominated by N - the size of allItems.


A better long term approach may be to pre-index the data (and keep it indexed) by Id. So instead of allItems being a List<T> - it could be a Dictionary<int, T>. Note that building the dictionary can be expensive, so you don't want to do this every time you want to search : the key is to do this once at the start (and keep it maintained). Then this becomes O(M) (the size of toSelect, which is usually small), since dictionary lookups are O(1).

IEnumerable<Item> SelectItems(Dictionary<int, Item> allItems, List<int> toSelect)
{
     foreach(var id in toSelect)
     {
         if (allItems.TryGetValue(id, out var found))
             yield return found;
     }
}

(there is no need to pre-hash toSelect since we aren't checking it for Contains)

Upvotes: 3

Related Questions