Reputation: 1718
In this question, assume the starting node preceding the first dot is .DocumentNode, type HtmlAgilityPack.HtmlDocument
.SelectSingleNode("*[contains(.,'Year Interior:')]")
Results in:
InnerHtml : <table width="822" height="173" class="diy-section-content-table adSpecView-section-content-body-container" border="1" cellspacing="0" cellpadding="0"><tbody><tr><td class="diy-section-content-table-td diy-template-column" valign="top"><ul><li><strong>Year Interior:</strong>2007</li><li>Good Condition</li></ul></td></tr></tbody>
I need the result to be only the last child containing "Year Interior:":
<li><strong>Year Interior:</strong>2007</li>
The Html I'm searching is inconsistent. "Year Interior:" may be in <li>,<span>,<td>,<div>, etc.
, which is why I cannot be more explicit in the search.
How could something like .SelectSingleNode("*[contains(.,'Year Interior:')]")
return only the last child containing "Year Interior:" and not the container element?
Of course, I cannot do this, yet it shows the result I need:
.SelectSingleNode("*/*/*/*/*/*/*[contains(.,'Year Interior:')]")
Needed Result : InnerHtml: <strong>Year Interior:</strong> 2007
UPDATE:
Trying the following is long-winded and close to working except it catches formatting tags, like <strong>
and <em>
:
.Descendants() | Where-Object {$_.InnerHtml -like "*Year Interior:*" -and $_.HasChildNodes -eq $false}).ParentNode
In this case, the first parent node is the strong tag, so the code will become more unwieldy to check if it's a formatting tag.
Upvotes: 1
Views: 656
Reputation: 3208
How about this:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
doc.DocumentNode.Descendants().Where(_ => !string.IsNullOrEmpty(_.InnerText) && _.InnerText.Trim().Equals("Year Interior:"));
Upvotes: 1