Reputation: 3410
Scope I am trying to parse this page. For those who are not familiar with portuguese, this page contains all the Subjects from a certain Course (university course), grouped by "Semester".
So, everytime you see something like this "7º Período Ideal", you can understand like "Subjects from the 7th semester".
Problem I am using a XPath expression to get all the Table Rows from the table that contains those table rows.
XPath Used : //table[@cellspacing=2]//tr
C# Statement : htmlMap.DocumentNode.SelectNodes("//table[@cellspacing=2]//tr");
The HtmlNodeCollection received by this C# statement, contains only the table row nodes until the one with this text EAD0648 Gerência de Produtos / Serviços e Mercados
, right after the one with 5º Período Ideal
.
This XPath "works", but i get all the tr's
(as it is expected), and this is not what i want.
//tr
Why is the XPath not retrieving all the nodes after this node aswell ?
Is there any cap of ammount of nodes retrieved ? Am i missing something ?
Thanks in advance
Upvotes: 2
Views: 377
Reputation: 53699
I have encountered this in the past, if the tables are not well formed then issues like this occur. I took a very quick look at the HTML for the page and I see what looks like a possible problem, on line 2785 there is a </tr>
then without a opening <tr>
line 2796 has another </tr>
.
I admit that I did not do an in depth validation to check, but just by looking at it I could not match the opening <tr>
. I immediately checked this because as I mentioned I have faced this exact issue with pages with malformed tables.
Upvotes: 3