Reputation: 4796
I have a list of Items, each containing a field of Type integer.
I want to filter my list to get only the items that match a given list of integers.
The code I have now works but I know it could be optimized.
Class Item
{
int ID;
//Other fields & methods that are irrelevant here
}
//Selection method
IEnumerable<Item> SelectItems(List<Item> allItems, List<int> toSelect)
{
return allItems.Where(x => toSelect.Contains(x.ID));
}
The problem I have is that I iterate through allItems
and in each iteration I iterate through toSelect
.
I have the feeling it is possible to be much more effective but I don't know how I can achieve this with Linq.
This might also be an already asked question as I don't know how this is called in English. This feels kind of stupid because I don't know how to formulate this properly in a seach engine.
Upvotes: 1
Views: 376
Reputation: 460058
You can use Join
which is more efficient because it's using a set based approach:
var selectedItems = from item in allItems
join id in toSelect
on item.Id equals id
select item;
return selectedItems;
Another way which is more efficient is to use a HashSet<int>
instead of a list:
IEnumerable<Item> SelectItems(List<Item> allItems, HashSet<int> toSelect)
{
return allItems.Where(x => toSelect.Contains(x.ID));
}
Upvotes: 5
Reputation: 1062600
There are two ways to approach this.
Currently you have O(N×M)
performance (where N
is the size of allItems
and M
is the size of toSelect
).
If you're just trying to reduce it easily, then you could reduce it to O(N)+O(M)
by creating a hash-set of toSelect
:
var matches = new HashSet<int>(toSelect);
return allItems.Where(x => matches.Contains(x.ID));
However, this is still going to be dominated by N
- the size of allItems
.
A better long term approach may be to pre-index the data (and keep it indexed) by Id
. So instead of allItems
being a List<T>
- it could be a Dictionary<int, T>
. Note that building the dictionary can be expensive, so you don't want to do this every time you want to search : the key is to do this once at the start (and keep it maintained). Then this becomes O(M)
(the size of toSelect
, which is usually small), since dictionary lookups are O(1)
.
IEnumerable<Item> SelectItems(Dictionary<int, Item> allItems, List<int> toSelect)
{
foreach(var id in toSelect)
{
if (allItems.TryGetValue(id, out var found))
yield return found;
}
}
(there is no need to pre-hash toSelect
since we aren't checking it for Contains
)
Upvotes: 3