NibblyPig
NibblyPig

Reputation: 52952

Why did my code work when I changed it from IEnumerable to List?

I'm trying to get my head around this rather than just chalking it up to general voodoo.

I do an EF query and get some data back, and I .ToList() it, like this:

IEnumerable<DatabaseMatch<CatName>> nameMatches = nameLogicMatcher.Match(myIQueryableOfCats).ToList();

Some cats appear twice in the database because they have multiple names, but each cat has a primary name. So in order to filter this down, I get all of the ids of the cats in a list:

List<int> catIds = nameMatches.Select(c => c.Match.CatId).ToList();

I then iterate through all of the distinct ids, get all of the matching cat names, and remove anything that isn't a primary name from the list, like this:

foreach (int catId in catIds.Distinct())
{
    var allCatNameMatches = nameMatches.Where(c => c.Match.CatId == catId);

    var primaryMatch = allCatNameMatches.FirstOrDefault(c => c.Match.NameType == "Primary Name");

    nameMatches = nameMatches.Except(allCatNameMatches.Where(c => c != primaryMatch)); 
}

Now this code, when I first ran it, just hung. Which I thought was odd. I stepped through it, and it seemed to work but after 10 iterations (it is capped at 100 cats in total) it started to slow down and then eventually it was glacial and then hung completely.

I thought maybe it was doing some intensive database work by mistake, but the profiler shows no SQL executed except that which retrieves the initial list of cat names.

I decided to change it from IEnumerable of nameMatches to a List, and put the appropriate .ToList() on the last line. It worked instantly and perfectly after I did this.

The question I'd like to ask is, why?

Upvotes: 2

Views: 167

Answers (2)

Tim Rogers
Tim Rogers

Reputation: 21723

Without the ToList() you are building up in nameMatches a nested chain of IEnumerables awaiting delayed execution. This might not be so bad, except you are also calling FirstOrDefault on each iteration which will execute the chain. So on iteration number n, you are executing the filter operations contained in the loop n-1 times. If you had 1000 distinct cats, the Linq chain is getting executed 1000 + 99 + ... + 1 times. (I think you have something that is O(n³)!)

The moral is, if you want to use delayed execution, make very sure that you're only executing your chain once.

Upvotes: 3

Vladimir
Vladimir

Reputation: 7475

Let's simplify your code a little:

foreach (int catId in catIds.Distinct())
{
    var allCatNameMatches = nameMatches.Where(c => c.Match.CatId == catId);
    var primaryMatch = null;
    nameMatches = nameMatches.Except(allCatNameMatches.Where(c => c != primaryMatch)); 
}

And a little more:

foreach (int catId in catIds.Distinct())
{
    nameMatches = nameMatches.Where(c => c.Match.CatId == catId);
    var primaryMatch = null;
    nameMatches = nameMatches.Except(nameMatches.Where(c => c != primaryMatch)); 
}

In the latter one it is obvious that due to deferred execution each pass of foreach body lengthens the chain of Where and Except. Then remember var primaryMatch = allCatNameMatches.FirstOrDefault. It is not deferred executed, so in each iteration of foreach it should execute all chain. Therefore it hangs.

Upvotes: 0

Related Questions