Anthony Shaw
Anthony Shaw

Reputation: 8166

IEnumerable Where filtering occuring without actually being called

I'm using HtmlAgilityPack to parse a page of HTML and retrieve a number of option elements from a select list.

The GvsaDivisions is a method that returns raw html from the result of a POST, irreverent in the context of the question

public IEnumerable<SelectListItem> Divisions(string season, string gender, string ageGroup)
{
    var document = new HtmlDocument();
    var html = GvsaDivisions(season);
    document.LoadHtml(html);

    var options = document.DocumentNode.SelectNodes("//select//option").Select(x => new SelectListItem() { Value = x.GetAttributeValue("value", ""), Text = x.NextSibling.InnerText });

    var divisions = options.Where(x => x.Text.Contains(string.Format("{0} {1}", ageGroup, gender)));
    if (ageGroup == "U15/U16")
    {
        ageGroup = "U15/16";
    }
    if (ageGroup == "U17/U19")
    {
        ageGroup = "U17/19";
    }

    return divisions;
}

What I'm observing is this... once the options.Where() is executed, divisions contains a single result. After the test of ageGroup == "U15/U16" and the assignment of ageGroup = "U15/16", divisions now contains 3 results (the original 1, with the addition of 2 new matching the criteria of the new value of ageGroup

Can anybody explain this anomaly? I expected to make a call to Union the result of a new Where query to the original results, but it seems it's happening automagically. While the results are what I desire, I have no way to explain how it's happening (or the certainty that it'll continue to act this way)

Upvotes: 0

Views: 115

Answers (3)

Moha Dehghan
Moha Dehghan

Reputation: 18463

LINQ queries use deferred execution, which means they are run whenever you enumerate the result.

When you change a variable that is being used in your query, you actually are changing the result of the next run of the query, which is the next time you iterate the result.

Read more about this here and here:

This is actually by-design, and in many situations it is very useful, and sometimes necessary. But if you need immediate evaluation, you can call the ToList() method at the end of your query, which materializes you query and gives you a normal List<T> object.

Upvotes: 6

mgmedick
mgmedick

Reputation: 700

I'm thinking along the same lines as Travis, the delayed execution of linq.

I'm not sure if this will avoid the issue, but I generally put my results into an immediate collection like this. With my experience it seems once you shove the results into a real defined collection I believe it may not be delayed execution.

 List<SelectListItem> options = document.DocumentNode.SelectNodes("//select//option").Select(x => new SelectListItem() { Value = x.GetAttributeValue("value", ""), Text = x.NextSibling.InnerText }).Where(x => x.Text.Contains(string.Format("{0} {1}", ageGroup, gender))).ToList<SelectListItem>();

Upvotes: 0

brader24
brader24

Reputation: 485

The divisions variable contains an unprocessed enumerator that calls the code x.Text.Contains(string.Format("{0} {1}", ageGroup, gender)) on each element in the list of nodes. Since you change ageGroup before you process that enumerator, it uses that new value instead of the old value.

For example, the following code outputs a single line with the text "pear":

List<string> strings = new List<string> { "apple", "orange", "pear", "watermelon" };
string matchString = "orange";

var queryOne = strings.Where(x => x == matchString);
matchString = "pear";

foreach (var item in queryOne)
{
    Console.WriteLine("   " + item);
}

Upvotes: 1

Related Questions