Marcello Grechi Lins
Marcello Grechi Lins

Reputation: 3410

HTMLAgilityPack XPath Expression not fetching all nodes

Scope I am trying to parse this page. For those who are not familiar with portuguese, this page contains all the Subjects from a certain Course (university course), grouped by "Semester".

So, everytime you see something like this "7º Período Ideal", you can understand like "Subjects from the 7th semester".

Problem I am using a XPath expression to get all the Table Rows from the table that contains those table rows.

XPath Used : //table[@cellspacing=2]//tr

C# Statement : htmlMap.DocumentNode.SelectNodes("//table[@cellspacing=2]//tr");

The HtmlNodeCollection received by this C# statement, contains only the table row nodes until the one with this text EAD0648 Gerência de Produtos / Serviços e Mercados, right after the one with 5º Período Ideal.

This XPath "works", but i get all the tr's(as it is expected), and this is not what i want.

//tr

Why is the XPath not retrieving all the nodes after this node aswell ?

Is there any cap of ammount of nodes retrieved ? Am i missing something ?

Thanks in advance

Upvotes: 2

Views: 377

Answers (1)

Chris Taylor
Chris Taylor

Reputation: 53699

I have encountered this in the past, if the tables are not well formed then issues like this occur. I took a very quick look at the HTML for the page and I see what looks like a possible problem, on line 2785 there is a </tr> then without a opening <tr> line 2796 has another </tr>.

I admit that I did not do an in depth validation to check, but just by looking at it I could not match the opening <tr>. I immediately checked this because as I mentioned I have faced this exact issue with pages with malformed tables.

Upvotes: 3

Related Questions