Reputation: 191
I have made a spider in scrapy that can successfully scrape data from a website.
def parse(self, response):
for text in response.css('div.row'):
yield {
'product': text.css('div.item a.item::text').get(),
'test1': text.css('div.item span::text')[0].get(),
'test2': text.css('div.item span::text')[1].get(),
This is not the complete code, but this should be enough to explain the problem.
The problem occurs when the 'test2': text.css('div.item span::text')[1].get(),
is empty.
It will give an IndexError: list index out of range
, which makes sense. But how can I check if the value is empty so I can replace it with a default?
get()
has a default parameters get(default='')
, unfortunately because I use keys [0]
this parameters is not available.ternary expressions
but I could not find a way to do this inside which I think is a dictionary.Upvotes: 0
Views: 282
Reputation: 142631
First get items = text.css(...)
,
next check if len(items) > 0
before you use items[0]
and if len(items) > 1
before you use items[1]
def parse(self, response):
for text in response.css('div.row'):
items = text.css('div.item span::text')
yield {
'product': text.css('div.item a.item::text').get(),
'test1': items[0].get() if len(items) > 0 else "",
'test2': items[1].get() if len(items) > 1 else "",
EDIT:
You can also use CSS :nth-of-type(1)
instead of [0]
in a.item:nth-of-type(1)::text
'div.item a.item:nth-of-type(1)::text'
Or xpath with [1]
'(.//div[@class="item"]/a[@class="item"])[1]/text()'
Scrapy
uses module parsel so I created minimal working code with parsel
text = '''
<div class="item">
<a class="item" href="a">a</a>
<a class="item" href="b">b</a>
</div>
'''
import parsel
s = parsel.Selector(text)
print(s.css('div.item a.item:nth-of-type(1)::text').get('empty')) # a
print(s.css('div.item a.item:nth-of-type(2)::text').get('empty')) # b
print(s.css('div.item a.item:nth-of-type(3)::text').get('empty')) # empty
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[1]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[2]/text()').get('empty'))
print(s.xpath('(.//div[@class="item"]/a[@class="item"])[3]/text()').get('empty'))
Upvotes: 2