HTMLAgilityPack XPath Expression not fetching all nodes

Question

Scope I am trying to parse this page. For those who are not familiar with portuguese, this page contains all the Subjects from a certain Course (university course), grouped by "Semester".

So, everytime you see something like this "7º Período Ideal", you can understand like "Subjects from the 7th semester".

Problem I am using a XPath expression to get all the Table Rows from the table that contains those table rows.

XPath Used : //table[@cellspacing=2]//tr

C# Statement : htmlMap.DocumentNode.SelectNodes("//table[@cellspacing=2]//tr");

The HtmlNodeCollection received by this C# statement, contains only the table row nodes until the one with this text EAD0648 Gerência de Produtos / Serviços e Mercados, right after the one with 5º Período Ideal.

This XPath "works", but i get all the tr's(as it is expected), and this is not what i want.

//tr

Why is the XPath not retrieving all the nodes after this node aswell ?

Is there any cap of ammount of nodes retrieved ? Am i missing something ?

Thanks in advance

Chris Taylor · Accepted Answer

I have encountered this in the past, if the tables are not well formed then issues like this occur. I took a very quick look at the HTML for the page and I see what looks like a possible problem, on line 2785 there is a then without a opening line 2796 has another .

I admit that I did not do an in depth validation to check, but just by looking at it I could not match the opening . I immediately checked this because as I mentioned I have faced this exact issue with pages with malformed tables.

HTMLAgilityPack XPath Expression not fetching all nodes

Answers (1)

Related Questions