TarunG
TarunG

Reputation: 610

Unable to figure out XPath in HtmlAgilityPack

I have trying to get around making my first C# application(that can do more than just say "Hello world"),

now the html file got lots of tags,(but got only two h4 tags that are given below.) but here is the part that i am interested in:

<table width="100%" height="400" border="0" align="center" cellpadding="0" cellspacing="0" bordercolor="#111111" background="images/page_bg.gif" style="BORDER-COLLAPSE: collapse">

<tbody valign="top">
<tr>
<td>

<table width="80%" border="0" valign=top background="images/page_bg.gif">
 <tr>
 <td>

  <div align="center">
   <h4 align="center">
      <font face="Verdana, Arial, Helvetica, sans-serif" size="2">
      <b>
      <font size="4" face="Arial, Helvetica, sans-serif">
      UNWANTED TEXT
       </font></b></font></h4>

  <p><br />
  Name  :  {NAME HERE} <br>Number : {NUMBERS HERE}<br>Number2 : {NUMBERS2}<br><br><h4>UNWANTED TEXT</h4><br>detail NO.  :  <span class=style7>{NUmbers3}</span><br><br><a href=http://test.xom>UNWANTED TEXT</a><br><br>                    
  </p>
  <p class="content"><em><strong>
  <p>&nbsp;</p>

I wish to get NAME,Numbers1,Numbers2,Numbers3, So, i guess i got to do something like this =

 //div[@align = "centre"]/h4/followingsibling::Text();

but surely it is incomplete, any ideas on how should i do it, I got the Xpath from firebug : /html/body/table/tbody/tr[2]/td/table/tbody/tr/td/table/tbody/tr[2]/td/div/table/tbody/tr/td/table/tbody/tr/td/div/h4

i have also tried doing(for just getting the raw data first and then trimming it further)

 HtmlNodeCollection node = doc.DocumentNode.SelectNodes("//table[@height='400']//div[@align='centre']"//p);
            foreach(HtmlNode node1 in node)    
                textBox1.Text += node1.InnerText;

But the Node here is passed on as NULL Any help is greatly appreciated.

Upvotes: 2

Views: 2668

Answers (2)

VikciaR
VikciaR

Reputation: 3412

Firefox adds tbody tag to table (in original html this tag can be absent). So, I would suggest do not write all path, find most characterizing path and use //. For example, //div[@class='data']/table//tr/td

Upvotes: 4

R. Martinho Fernandes
R. Martinho Fernandes

Reputation: 234354

Did you notice that you have @align="centre" but the HTML has align="center" (as in, British vs US spelling)?

Upvotes: 3

Related Questions