Reputation: 13945
I need to filter a List<object>
so that I remove any items where a string
property does not exist inside another List<string>
.
I created this console app just to make sure I had the LINQ syntax correct:
class FooBar
{
public int Id { get; set; }
public string ValueName { get; set; }
}
and then...
List<FooBar> foobars = new List<FooBar>
{
new FooBar { Id = 1, ValueName = "Val1" },
new FooBar { Id = 2, ValueName = "Val2" },
new FooBar { Id = 3, ValueName = "Val3" },
new FooBar { Id = 4, ValueName = "Val4" }
};
List<string> myStrings = new List<string>
{
"Val1",
"Val3"
};
// Only keep records where ValueName is found in `myStrings`
foobars = foobars.Where(f => myStrings.Contains(f.ValueName)).ToList();
So, this line:
foobars = foobars.Where(f => myStrings.Contains(f.ValueName)).ToList();
does exactly what I want, and it gives me back these two records:
{ Id = 1, ValueName = "Val1" }
{ Id = 3, ValueName = "Val3" }
All's well. BUT... in the actual application, foobars
has over 200k items, and myStrings
has about 190k. And when that LINQ line gets executed, it takes upwards of 5 minutes to complete.
I'm clearly doing something wrong. 200k records isn't THAT big. And the real FooBar
isn't all that complex (no nested objects, and only 9 properties).
What's going on here?
Upvotes: 2
Views: 175
Reputation: 36710
The problem here is that you're doing foobars.Where(f => myStrings.Contains(f.ValueName))
, so for every item in foobars
your are checking all items of myStrings
.
That scales quadratic. Also called O(n^2), read more here. So if you have 10+10 items, you do 100 checks(10*10), and if you have 10,000+10,000 items, you will do 100,000,000 checks. In your case you're doing 38,000,000,000+ checks ;)
Solution: create a HashSet from myStrings
and use Contains
of the HashSet
.
e.g. replace with:
var myStringsSet = new HashSet<string>(myStrings);
foobars = foobars.Where(f => myStringsSet.Contains(f.ValueName)).ToList();
Now with 10,000+10,000 items, you will do 10,000 checks instead of 100,000,000. In your case that will 200,000 checks instead of 38,000,000,000.
Upvotes: 6