xpath to extract all the text in a specific node and return it as one element using scrapy

Question

So I have this html:



   This is my first sentence
   

   This sentance should be considered as part of the first one.
   

   And this also


   This is the second sentence

I want to extract the text from the p nodes, all the text in one node should be returned as one element, I am using scrapy shell like this:

scrapy shell path/to/file.html
response.xpath('//p/text()').extract()

the output I get is:

[
'This is my first sentence',
'This sentance should be considered as part of the first one.'
'And this also'
'This is the second sentence'
]

the output I want:

[
 'This is my first sentence This sentance should be considered as part of the first one And this also'
 'This is the second sentence'
]

Any help about how to solve this using xpath expression

Thank you very much :))))

Rodwan Bakkar · Accepted Answer

This solved the issue...

from w3lib.html import remove_tags
two_texts = response.xpath('//p').extract()
two_texts = [remove_tags(text) for text in two_texts]

xpath to extract all the text in a specific node and return it as one element using scrapy

Answers (2)

Related Questions