STWilson
STWilson

Reputation: 1718

HtmlAgilityPack How to Select Last Child in Wildcard Search

In this question, assume the starting node preceding the first dot is .DocumentNode, type HtmlAgilityPack.HtmlDocument

.SelectSingleNode("*[contains(.,'Year Interior:')]")

Results in:

InnerHtml : <table width="822" height="173" class="diy-section-content-table adSpecView-section-content-body-container" border="1" cellspacing="0" cellpadding="0"><tbody><tr><td class="diy-section-content-table-td diy-template-column" valign="top"><ul><li><strong>Year Interior:</strong>2007</li><li>Good Condition</li></ul></td></tr></tbody>

I need the result to be only the last child containing "Year Interior:":

<li><strong>Year Interior:</strong>2007</li>

The Html I'm searching is inconsistent. "Year Interior:" may be in <li>,<span>,<td>,<div>, etc., which is why I cannot be more explicit in the search.

How could something like .SelectSingleNode("*[contains(.,'Year Interior:')]") return only the last child containing "Year Interior:" and not the container element?

Of course, I cannot do this, yet it shows the result I need: .SelectSingleNode("*/*/*/*/*/*/*[contains(.,'Year Interior:')]")

Needed Result : InnerHtml: <strong>Year Interior:</strong> 2007

UPDATE: Trying the following is long-winded and close to working except it catches formatting tags, like <strong> and <em>:

.Descendants() | Where-Object {$_.InnerHtml -like "*Year Interior:*" -and $_.HasChildNodes -eq $false}).ParentNode

In this case, the first parent node is the strong tag, so the code will become more unwieldy to check if it's a formatting tag.

Upvotes: 1

Views: 656

Answers (1)

Hung Cao
Hung Cao

Reputation: 3208

How about this:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
doc.DocumentNode.Descendants().Where(_ => !string.IsNullOrEmpty(_.InnerText) && _.InnerText.Trim().Equals("Year Interior:"));

Upvotes: 1

Related Questions