Casey Crookston
Casey Crookston

Reputation: 13945

LINQ .Where query takes 5+ minutes to execute

I need to filter a List<object> so that I remove any items where a string property does not exist inside another List<string>.

I created this console app just to make sure I had the LINQ syntax correct:

class FooBar
{
    public int Id { get; set; }
    public string ValueName { get; set; }
}

and then...

List<FooBar> foobars = new List<FooBar>
{
    new FooBar { Id = 1, ValueName = "Val1" },
    new FooBar { Id = 2, ValueName = "Val2" },
    new FooBar { Id = 3, ValueName = "Val3" },
    new FooBar { Id = 4, ValueName = "Val4" }
};

List<string> myStrings = new List<string>
{
    "Val1",
    "Val3"
};

// Only keep records where ValueName is found in `myStrings`
foobars = foobars.Where(f => myStrings.Contains(f.ValueName)).ToList();

So, this line:

foobars = foobars.Where(f => myStrings.Contains(f.ValueName)).ToList();

does exactly what I want, and it gives me back these two records:

{ Id = 1, ValueName = "Val1" }
{ Id = 3, ValueName = "Val3" }

All's well. BUT... in the actual application, foobars has over 200k items, and myStrings has about 190k. And when that LINQ line gets executed, it takes upwards of 5 minutes to complete.

I'm clearly doing something wrong. 200k records isn't THAT big. And the real FooBar isn't all that complex (no nested objects, and only 9 properties).

What's going on here?

Upvotes: 2

Views: 175

Answers (1)

Julian
Julian

Reputation: 36710

The problem here is that you're doing foobars.Where(f => myStrings.Contains(f.ValueName)) , so for every item in foobars your are checking all items of myStrings.

That scales quadratic. Also called O(n^2), read more here. So if you have 10+10 items, you do 100 checks(10*10), and if you have 10,000+10,000 items, you will do 100,000,000 checks. In your case you're doing 38,000,000,000+ checks ;)

Solution: create a HashSet from myStrings and use Contains of the HashSet.

e.g. replace with:

var myStringsSet = new HashSet<string>(myStrings);
foobars = foobars.Where(f => myStringsSet.Contains(f.ValueName)).ToList();

Now with 10,000+10,000 items, you will do 10,000 checks instead of 100,000,000. In your case that will 200,000 checks instead of 38,000,000,000.

Upvotes: 6

Related Questions