Reputation: 189
I am trying to extract jokes from a website and I need to get the jokes one by one:
div class="oneliner"
itemscope=""
itemtype="http://schema.org/Article">
<p>My girl always tells me "Life is about the little things", but I just hate when she talks about her Ex.</p>
What I came up with so far using xpath is
.xpath('//div[@class="oneliner"]')
With this I am able to extract the single items, but now I want to loop over all occurences and extract the text (everything between \p ). For this I tried
for joke in jokes:
item['joke'] = joke.xpath('//p/text()').extract()
But this gives me all jokes from that page at once instead of going through one by one. Could anyone help me with this?
Upvotes: 0
Views: 125
Reputation: 21446
You can simply iterate through joke nodes and yield an item with every iteration:
def parse(self, response):
jokes = response.xpath('//div[@class="oneliner"]')
for joke in jokes:
item = dict()
item['joke'] = joke.xpath('.//p/text()').extract()
yield item
Upvotes: 2