Reputation: 87
I have the following html code:
<div class='article'>
<p>Lorem <strong>ipsum</strong> si ammet</p>
</div>
So to get the text data as: Lorem ipsum si ammet
, so I tried to use:
response.css('div.article >p::text ').extract()
But I only receive only lorem sie ammet
.
How can I get both <p>
and <strong>
texts using CSS selectors?
Upvotes: 3
Views: 1653
Reputation: 41
In Scrapy 2.7+ you can do so with following
text = response.css('div.article *::text').getall()
text = [t.strip() for t in text]
text = "".join(text)
getall()
method returns list.
Upvotes: 1
Reputation: 21351
One liner solution.
"".join(a.strip() for a in response.css("div.article *::text").extract())
div.article *
means to scrape everything inside the div.article
Or an easy way to write it
text = ""
for a in response.css("div.article *::text").extract()
text += a.strip()
Both approaches are same,
Upvotes: 4