Mike
Mike

Reputation: 31

How do I retrieve all text in an HTML DOM but exclude SCRIPT and STYLE tags?

I know how to quickly extract text nodes from a DOM:

document.evaluate('//text()', document, null, XPathResult.ANY_TYPE, null)

But is there an easy way to exclude text from SCRIPT, STYLE, or other tags that are not shown to the user?

Something like:

'//text()[ parent.name not in ("SCRIPT", "STYLE") ]'

Thanks, Mike

Upvotes: 3

Views: 1313

Answers (2)

user357812
user357812

Reputation:

Besides Nick Jones correct answer, for more complex exclusion you should use XPath node set exclusion expression:

$ns1[not(count(.|$ns2)=count($ns2))]

In this case:

//*[not(count(.|//script|/*/*/style)=count(//script|/*/*/style))]/text()

Upvotes: 1

Nick Jones
Nick Jones

Reputation: 6493

//*[not(self::script or self::style)]/text()

Upvotes: 7

Related Questions