How to deal with empty fields in scrapy when using keys

Question

I have made a spider in scrapy that can successfully scrape data from a website.

   def parse(self, response):
            for text in response.css('div.row'):
                yield {
                    'product': text.css('div.item a.item::text').get(),
                    'test1': text.css('div.item span::text')[0].get(),
                    'test2': text.css('div.item span::text')[1].get(),

This is not the complete code, but this should be enough to explain the problem.

The problem occurs when the 'test2': text.css('div.item span::text')[1].get(), is empty.

It will give an IndexError: list index out of range, which makes sense. But how can I check if the value is empty so I can replace it with a default?

I know the get() has a default parameters get(default=''), unfortunately because I use keys [0] this parameters is not available.
I was looking into ternary expressions but I could not find a way to do this inside which I think is a dictionary.

furas · Accepted Answer

First get items = text.css(...),

next check if len(items) > 0 before you use items[0]
and if len(items) > 1 before you use items[1]

    def parse(self, response):
        for text in response.css('div.row'):
            items = text.css('div.item span::text')
            yield {
                'product': text.css('div.item a.item::text').get(),
                'test1': items[0].get() if len(items) > 0 else "",
                'test2': items[1].get() if len(items) > 1 else "",

EDIT:

You can also use CSS :nth-of-type(1) instead of [0] in a.item:nth-of-type(1)::text

'div.item a.item:nth-of-type(1)::text'

Or xpath with [1]

'(.//div[@class="item"]/a[@class="item"])[1]/text()'

Scrapy uses module parsel so I created minimal working code with parsel

text = '''

a
b

'''

import parsel

s = parsel.Selector(text)

print(s.css('div.item a.item:nth-of-type(1)::text').get('empty')) # a
print(s.css('div.item a.item:nth-of-type(2)::text').get('empty')) # b
print(s.css('div.item a.item:nth-of-type(3)::text').get('empty')) # empty


print(s.xpath('(.//div[@class="item"]/a[@class="item"])[1]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[2]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[3]/text()').get('empty'))

How to deal with empty fields in scrapy when using keys

Answers (1)

Related Questions