Ben
Ben

Reputation: 669

Selecting HTML option values individually using HTMLAgilityPack

I'm trying to get the values of an option list as individual items, but this code is instead just grabbing the entire list into one element. Here is the code I'm using:

List<string> chapterTitles = new List<string>();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(htmlContent);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='chap_select']/option"))
chapterTitles.Add(node.InnerText);

What happens is the first element in chapterTitles has the entire set of option values, rather than (for example), there being 12 different entries in the list to correspond to an option list with 12 values.

Here is the HTML segment that I'm trying to parse:

<SELECT id=chap_select title="Chapter Navigation" Name=chapter onChange="self.location = '/s/5231611/'+ this.options[this.selectedIndex].value + '/Behind-Enemy-Lines-I-Light-Hammer';"><option  value=1 selected>1. Prologue<option  value=2 >2. Chapter One<option  value=3 >3. Chapter Two<option  value=4 >4. Chapter Three<option  value=5 >5. Chapter Four<option  value=6 >6. Chapter Five<option  value=7 >7. Chapter Six<option  value=8 >8. Chapter Seven<option  value=9 >9. Chapter Eight<option  value=10 >10. Chapter Nine<option  value=11 >11. Chapter Ten<option  value=12 >12. Chapter Eleven</select>

Any suggestions?

Upvotes: 1

Views: 1966

Answers (1)

Oscar Mederos
Oscar Mederos

Reputation: 29843

HtmlAgilityPack doesn't seem to parse that code really well. For example, the code

<option  value=3 >3. Chapter Two<option  value=4 >...

should really be

<option value="3">3. Chapter Two</option>
<option value="4">...

so, what I propose you to parse that is doing the following:

var doc = .. //Load the HTML code here.
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='chap_select']/option")) {
    chapterTitles.Add(node.NextSibling.InnerText);
}

The main two differences:

  1. I removed HtmlNode.ElementsFlags.Remove("option");
  2. The texts are found in the nodes that are next to the option nodes (instead of inside);

Upvotes: 2

Related Questions