user1590636
user1590636

Reputation: 1194

string join the innertext of html childnodes

My HTML looks like the following

<h4>
<span>Cat</span>
<span>Dog</span>
<a href="xxx" class="telcat">Potatoes</a>
</h4>

i am trying to produce the following string from the above, which simply the child elements innertext joined by a comma

Cat,Dog,Potatoes

i tried somthing like

 string x = String.Join(",", htmldoc.DocumentNode.SelectNodes("//h4").Elements().Select(el => el.InnerText).ToList());

however i get looking output, the string i get looks like

,Cat,
,Dog,
,Potatoes,

Upvotes: 0

Views: 389

Answers (1)

shriek
shriek

Reputation: 5197

That's because there are TextNodes in the Html that have no text. Fixing that is rather easy though, you just have to filter the empty text.

Like so:

 string x = String.Join(",", doc.DocumentNode
    .SelectNodes("//h4").Elements()
    .Select(el => el.InnerText)
    .Where(text => !string.IsNullOrWhiteSpace(text)));

If you want something like that for the whole page, i posted something similiar here.

Upvotes: 1

Related Questions