Extract content between using xpath for webscraping

Question

I am trying to extract jokes from a website and I need to get the jokes one by one:

div class="oneliner" 
     itemscope="" 
     itemtype="http://schema.org/Article">

            My girl always tells me "Life is about the little things", but I  just hate when she talks about her Ex.

What I came up with so far using xpath is

.xpath('//div[@class="oneliner"]')

With this I am able to extract the single items, but now I want to loop over all occurences and extract the text (everything between \p ). For this I tried

for joke in jokes:

     item['joke'] = joke.xpath('//p/text()').extract()

But this gives me all jokes from that page at once instead of going through one by one. Could anyone help me with this?

Granitosaurus · Accepted Answer

You can simply iterate through joke nodes and yield an item with every iteration:

def parse(self, response):
    jokes = response.xpath('//div[@class="oneliner"]')
    for joke in jokes:
        item = dict()
        item['joke'] = joke.xpath('.//p/text()').extract()
        yield item

Extract content between <\p> using xpath for webscraping

Answers (1)

Related Questions

Extract content between &lt;\p&gt; using xpath for webscraping

Answers (1)

Related Questions

Extract content between <\p> using xpath for webscraping