Reputation: 906
So here is my Scrapy crawler code. I am trying to extract meta data values from a website. No metadata appears more than once on a page.
class MySpider(BaseSpider):
name = "courses"
start_urls = ['http://www.example.com/listing']
allowed_domains = ["example.com"]
def parse(self, response):
hxs = Selector(response)
#for courses in response.xpath(response.body):
for courses in response.xpath("//meta"):
yield {
'ScoreA': courses.xpath('//meta[@name="atarbur"]/@content').extract_first(),
'ScoreB': courses.xpath('//meta[@name="atywater"]/@content').extract_first(),
'ScoreC': courses.xpath('//meta[@name="atarsater"]/@content').extract_first(),
'ScoreD': courses.xpath('//meta[@name="clearlywaur"]/@content').extract_first(),
}
for url in hxs.xpath('//ul[@class="scrapy"]/li/a/@href').extract():
yield Request(response.urljoin(url), callback=self.parse)
So what I am trying to achieve is that if the values of any of the Scores is an empty string (''), I want to repalce it with 0 (zero). I am not sure how to add conditional logic inside the 'yield' block.
Any help is very appreciated.
Thanks
Upvotes: 1
Views: 592
Reputation: 21436
extract_first()
method has an optional parameter for default value, however in your case you can just use or
expression:
foo = response.xpath('//foo').extract_first('').strip() or 0
in this case if extract_first()
returns a string without any text it will evaluate to `False so the latest member of the evalution(0) will be taken instead.
To convert the string type to something else try:
foo = int(response.xpath('//foo').extract_first('').strip() or 0)
Upvotes: 4