Riyas PK
Riyas PK

Reputation: 3217

How to select all text content inside div using XPath?

I want to select all text inside a div without considering tags inside.

<div>
<p>some text here <a href="">a link here  <span>span here<span></a></p>
</div>

I need to get the result as

some text here a link here span here

I tried this

response.xpath('//div/text()')

Upvotes: 5

Views: 5239

Answers (3)

kjhughes
kjhughes

Reputation: 111726

You're asking for the string-value of that div:

string(/div)

Or, if you wish whitespace to be trimmed from the ends and consolidated internally:

normalize-space(/div)

Upvotes: 5

gangabass
gangabass

Reputation: 10666

Try to string() it with XPath:

response.xpath('string(//div)').extract_first()

Upvotes: 2

Agus Mathew
Agus Mathew

Reputation: 941

check the following code for clarification

response.xpath('//div//text()')

and try the following for the required output

" ".join([i.strip() for i in tree.xpath('//div//text()') if i.strip()])

Upvotes: 0

Related Questions