How to skip missing data in Scrapy

Question

Let's say that I have html page something like this:

...
joke23

joke24

joke25

...

As You can see I don't have link to joke24 ;)

I want to that to every joke assign his link. If link does not exist exist I want assign it as None.

My code:

...
def parse(self, response):
    for joke, link in response.css(itertools.zip_longest(response.css('a.hehe'), response.css('a.hrtojoke')):
        yield {
            'name_joke': joke.xpath('span/text()').extract_first(),
            'link_joke': link.css('::attr(href)').extract_first(),
        } 
...

As You can guess, this code works, but no correctly

Current Output:

...
{'name_joke': 'joke23', 'link_joke': 'link/to/joke23'}
{'name_joke': 'joke25', 'link_joke': 'link/to/joke25'}
error..
...

Desired Output:

{'name_joke': 'joke23', 'link_joke': 'link/to/joke23'}
{'name_joke': 'joke24', 'link_joke': None}
{'name_joke': 'joke25', 'link_joke': 'link/to/joke25'}

How can I achieve my goal?

Konstantin · Accepted Answer

Try this one:

def parse(self, response):
    for item in response.xpath('//*[@class="hehe"]'):
        joke = item.xpath('./span/text()').extract_first() 
        link = item.xpath('./following-sibling::*[1][@class="hrtojoke"]/@href').extract_first()
        yield {'name_joke': joke, 'link_joke': link}

OUTPUT:

{'name_joke': 'joke23', 'link_joke': 'link/to/joke23'}
{'name_joke': 'joke24', 'link_joke': None}
{'name_joke': 'joke25', 'link_joke': 'link/to/joke25'}

How to skip missing data in Scrapy

Answers (2)

Related Questions