Reputation: 1697
I am developping a .Net console application.
I want to request an HTML page and then pick up some data inside.
I use the Html Agility Pack to build an object model from the response HTML page and to select nodes by using xPath.
Here is an extract of the response HTML page :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- ... -->
<body>
<div class="conteneur">
<!-- ... -->
<div class="page">
<div class="inter_page">
<!-- ... -->
<form action="missions.html" method="post" id="formliste">
<table class="tbl_deco_mini" cellspacing="0" style="width: 30%; margin: 0px;">
<tr>
<!-- ... -->
</tr>
<tr>
<td colspan="2" class="td">
<div class="inliste">
<p class="ligne_epee">
<a id="3"></a><a href="http://ffta.mimigyaru.com/missions,affiche_001-moisson-dherbe.html#3"
class="simple">
<img src="http://ffta.mimigyaru.com/medias/divers/mission_batail.png" alt="Moisson d'herbe"
class="img_middle" title="Moisson d'herbe" /></a> <a href="http://ffta.mimigyaru.com/missions,affiche_001-moisson-dherbe.html#3">001-Moisson
d'herbe</a>
</p>
<!-- ... -->
</div>
</td>
</tr>
<tr>
<!-- ... -->
</tr>
</table>
</form>
</div>
</div>
<!-- ... -->
</div>
</body>
</html>
I want to select the <table>
node which is the first child of the <form>
node.
I have written the following code :
HtmlDocument l_missionsDoc = new HtmlDocument();
l_missionsDoc.Load(l_stream);
XPathNavigator l_navigator = l_missionsDoc.CreateNavigator();
XPathNodeIterator l_iterator = l_navigator.Select("//form[@id='formliste']/table");
if (l_iterator.Count <= 0) continue;
l_iterator.Count
is equal to 0 but it must be equal to 1.
What is wrong with my xPath selection ?
Any help will be greatly appreciated.
Upvotes: 1
Views: 813
Reputation: 138776
This is because the FORM tag has a special treatment by the HTML Agility Pack. The reasons are described here: HtmlAgilityPack -- Does <form> close itself for some reason?
So, you basically need to remove that special treatment, like this (must happen before any load):
// instruct the library to treat FORM like any other tag
HtmlNode.ElementsFlags.Remove("form");
HtmlDocument l_missionsDoc = new HtmlDocument();
l_missionsDoc.Load(l_stream);
XPathNavigator l_navigator = l_missionsDoc.CreateNavigator();
XPathNodeIterator l_iterator = l_navigator.Select("//form[@id='formliste']/table");
if (l_iterator.Count <= 0) continue;
Upvotes: 3