TheGateKeeper
TheGateKeeper

Reputation: 4530

Using HTMLAgilityPack to get all the values of a select element

Here is what I have so far:

HtmlAgilityPack.HtmlDocument ht = new HtmlAgilityPack.HtmlDocument();

TextReader reader = File.OpenText(@"C:\Users\TheGateKeeper\Desktop\New folder\html.txt");
ht.Load(reader);

reader.Close();

HtmlNode select= ht.GetElementbyId("cats[]");

List<HtmlNode> options = new List<HtmlNode>();

foreach (HtmlNode option in select.ChildNodes)
{
    if (option.Name == "option")
    {
        options.Add(option);
    }
}

Now I have a list of all the "options" for the select element. What properties do I need to access to get the key and the text?

So if for example the html for one option would be:

<option class="level-1" value="1">Funky Town</option>

I want to get as output:

1 - Funky Town

Thanks

Edit: I just noticed something. When I got the child elements of the "Select" elements, it returned elements of type "option" and elements of type "#text".

Hmmm .. #text has the string I want, but select has the value.

I tought HTMLAgilityPack was an html parser? Why did it give me confusing values like this?

Upvotes: 0

Views: 5149

Answers (2)

lincolnk
lincolnk

Reputation: 11238

edit: you should probably be selecting the option nodes directly via xpath. I think this should work for that:

var options = select.SelectNodes("option");

that will get your options without the text nodes. the options should contain that string you want somewhere. waiting for your html sample.

foreach (var option in options)
{
    int value = int.Parse(option.Attributes["value"].Value);
    string text = option.InnerText;
}

 
you can add some sanity checking on the attribute to make sure it exists.

Upvotes: 0

sisve
sisve

Reputation: 19781

This is due to the default configuration for the html parser; it has configured the <option> as HtmlElementFlag.Empty (with the comment 'they sometimes contain, and sometimes they don't...'). The <form> tag has the same setup (CanOverlap + Empty) which causes them to appear as empty nodes in the dom, without any child nodes.

You need to remove that flag before parsing the document.

HtmlNode.ElementsFlags.Remove("option");

Notice that the ElementsFlags property is static and any changes will affect all further parsing.

Upvotes: 2

Related Questions