Reputation: 161
I'm facing a problem in my webscraper, essentially I need to get the decimal number inside the cell team_a_col home
:
<th>Med. goal subiti p/p</th>
<td class='team_a_col total'>0.76</td>
<td class='team_a_col home'>0.89
<td class='team_a_col away'>0.62</td></td>
so the result should be: 0.89
but as you can see the html
have a bad structure, so instead of get 0.89
I get also the content of team_a_col away
with this code:
node.SelectSingleNode(".//td[@class='team_a_col home']").InnerText.Trim();
How can I get only 0.89? The </td>
should be before of <team_a_col away
..
Upvotes: 1
Views: 1742
Reputation: 460238
You should set HtmlDocument.FixNestedTags
to true
:
string html = "<th>Med. goal subiti p/p</th><td class='team_a_col total'>0.76</td><td class='team_a_col home'>0.89<td class='team_a_col away'>0.62</td></td>";
var doc = new HtmlAgilityPack.HtmlDocument
{
OptionFixNestedTags = true,
OptionCheckSyntax = true,
OptionAutoCloseOnEnd = true
};
doc.LoadHtml(html);
string tdText = doc.DocumentNode.SelectSingleNode(".//td[@class='team_a_col home']")?.InnerText.Trim();
With FixNestedTags
the result is: 0.89
Upvotes: 3
Reputation: 360
Could you just take whole line and then substring and fetch the data?
var node = doc.DocumentNode.SelectNodes("//htmlelment/htmlelment");
string[] nodeArray = node[0].OuterHtml.Split(' ');
Upvotes: 0