Aron Woost
Aron Woost

Reputation: 20668

How to exclude linebreak-only textnodes from text() XPath query?

I want to query all textnodes from my DOM. However, I don't want to have these "markup-linebreaks", where there is a linebreak between HTML tags.

So I'm trying to translate all whitespaces according to here and check if there're chars left:

/html/body//text()[not(translate(., '	

', '') = '')]

This doesn't work, since it doesn't seams to be possible to check agains empty strings (which kind of makes sense, since it's not a text node then).

Any other approach to filter this nodes?

Upvotes: 2

Views: 1189

Answers (1)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

Use:

/html/body//text()[normalize-space()]

This selects all text-node descendants of /html/body each of which has a non-empty string value after normalization.

The above expression uses the standard XPath function normalize-space() which takes a string (or the string-value of the context-node, if specified with no argument) and returns another one in which all leading and trailing whitespace characters are deleted and any intermediate group of adjacent whitespace characters has been replaced by a single space.

Upvotes: 3

Related Questions