Alex Silachev
Alex Silachev

Reputation: 63

Using XPath to get text of paragraph with links inside

I'm parsing HTML page with XPath and want to grab whole text of some specific paragraph, including text of links.

For example I have following paragraph:

<p class="main-content">
    This is sample paragraph with <a href="http://google.com">link</a> inside.
</p>

I need to get following text as result: "This is sample paragraph with link inside", however applying "//p[@class'main-content']/text()" gives me only "This is sample paragraph with inside".

Could you please assist? Thanks.

Upvotes: 6

Views: 13766

Answers (1)

lonesomeday
lonesomeday

Reputation: 237975

To get the whole text content of a node, use the string function:

string(//p[@class="main-content"])

Note that this gets a string value. If you want text nodes (as returned by text()), you can do this. You need to search at all depths:

//p[@class="main-content"]//text()

This returns three text nodes: This is sample paragraph with, link and inside.

Upvotes: 9

Related Questions