Reputation: 669
I'm trying to get the values of an option list as individual items, but this code is instead just grabbing the entire list into one element. Here is the code I'm using:
List<string> chapterTitles = new List<string>();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(htmlContent);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='chap_select']/option"))
chapterTitles.Add(node.InnerText);
What happens is the first element in chapterTitles has the entire set of option values, rather than (for example), there being 12 different entries in the list to correspond to an option list with 12 values.
Here is the HTML segment that I'm trying to parse:
<SELECT id=chap_select title="Chapter Navigation" Name=chapter onChange="self.location = '/s/5231611/'+ this.options[this.selectedIndex].value + '/Behind-Enemy-Lines-I-Light-Hammer';"><option value=1 selected>1. Prologue<option value=2 >2. Chapter One<option value=3 >3. Chapter Two<option value=4 >4. Chapter Three<option value=5 >5. Chapter Four<option value=6 >6. Chapter Five<option value=7 >7. Chapter Six<option value=8 >8. Chapter Seven<option value=9 >9. Chapter Eight<option value=10 >10. Chapter Nine<option value=11 >11. Chapter Ten<option value=12 >12. Chapter Eleven</select>
Any suggestions?
Upvotes: 1
Views: 1966
Reputation: 29843
HtmlAgilityPack doesn't seem to parse that code really well. For example, the code
<option value=3 >3. Chapter Two<option value=4 >...
should really be
<option value="3">3. Chapter Two</option>
<option value="4">...
so, what I propose you to parse that is doing the following:
var doc = .. //Load the HTML code here.
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='chap_select']/option")) {
chapterTitles.Add(node.NextSibling.InnerText);
}
The main two differences:
HtmlNode.ElementsFlags.Remove("option");
option
nodes (instead of inside);Upvotes: 2