Reputation: 20140
I have something like the following html
:
<div class="articleBody">
<p>
<strong>Text</strong> lorem ipsum...
<strong>lorem ipsum...</strong>
</p>
<p>lorem ipsum
<strong> lorem ipsum lorem ipsum</strong>
lorem ipsum...lorem ipsum...lorem ipsum...lorem ipsum...
</p>
</div>
In a more general way, I have a list of <p>
tags with a few <strong>
tags inside.
I would like to get the text of all the <p>
tags, minus the <strong>
tags... and by that, I mean just the text in the "articleBody" div
class.
What I have is
response.xpath('string(//div[@class="articleBody"]//p)'.extract()
but that only returns the first <p>
.
Any help would be appreciated.
Upvotes: 1
Views: 1582
Reputation: 1294
give this a shot:
for node in response.xpath('//div[@class="articleBody"]//p'):
print node.xpath('string()').extract()
...then you can concatenate your strings or add them to a list or whatever instead of just printing them like I did.
there is also the string-join() function for xpath 2.0 but it looks like scrapy supports xpath 1.0.
more info about string-join and such here: http://www.w3.org/TR/xpath-functions/#func-string-join
Upvotes: 4