Raheel
Raheel

Reputation: 9024

Get all text including html in a single node scrapy xpath

response.xpath('//*[@id="blah"]//text()')

Suppose my html is

<p id="blah">This is a simple text <a href="#">foo</a> and this is after tag. </p>

What is happening i get a list of text even though its one <p> tag. Such as

[u'This is a simple text', u' and this is after tag.']

My actual html content is huge and I have to do join in order to achieve this. Also i lose foo while join. Is there any specific xpath scrapy mechanism of doing this ?

I want to get result This is a simple text foo and this is after tag.

Please notice the foo here too.

Thanks

Upvotes: 1

Views: 440

Answers (2)

Andersson
Andersson

Reputation: 52665

You can get all text nodes as single string as below:

response.xpath('//*[@id="blah"]')[0].text_content()

Output:

'This is a simple text foo and this is after tag. '

Upvotes: 1

gtosto
gtosto

Reputation: 1341

if it's xpath 2 you can use the string-join function

response.xpath('string-join(//*[@id="blah"]//text())')

Upvotes: 1

Related Questions