Reputation: 1194
My HTML looks like the following
<h4>
<span>Cat</span>
<span>Dog</span>
<a href="xxx" class="telcat">Potatoes</a>
</h4>
i am trying to produce the following string from the above, which simply the child elements innertext joined by a comma
Cat,Dog,Potatoes
i tried somthing like
string x = String.Join(",", htmldoc.DocumentNode.SelectNodes("//h4").Elements().Select(el => el.InnerText).ToList());
however i get looking output, the string i get looks like
,Cat,
,Dog,
,Potatoes,
Upvotes: 0
Views: 389
Reputation: 5197
That's because there are TextNodes in the Html that have no text. Fixing that is rather easy though, you just have to filter the empty text.
Like so:
string x = String.Join(",", doc.DocumentNode
.SelectNodes("//h4").Elements()
.Select(el => el.InnerText)
.Where(text => !string.IsNullOrWhiteSpace(text)));
If you want something like that for the whole page, i posted something similiar here.
Upvotes: 1