Extract all elements from within p tag scrapy

Question

I am using scrapy to scrap a website that has a similar structure to the following:


    ...

    
        Some text
    
    
        
            More Text
            
Another Text

I am able to scrap all the text inside the different

tags with something like this //p//text().extract() the problem is that this splits the elements inside the same tag in the result:

'text': ['Some text', 'More Text', 'Another Text']

And ideally I would need it like this:

'text': ['Some text', 'More Text Another Text']

Is it possible to get the result like that?

stasdeep · Accepted Answer

In these cases I do the following trick:

text = [
    ' '.join(
        line.strip() 
        for line in p.xpath('.//text()').extract() 
        if line.strip()
    ) 
    for p in response.xpath('//p')
]

This will give you exactly what you want.

Extract all elements from within p tag scrapy

Answers (2)

Related Questions