Reputation: 467
I want to scrape some data from this website, my spider code is:
# -*- coding: utf-8 -*-
import scrapy
from coder.items import CoderItem
# from scrapy.loader import ItemLoader
class LivingsocialSpider(scrapy.Spider):
name = "livingsocial"
allowed_domains = ["livingsocial.com"]
start_urls = (
'http://www.livingsocial.com/cities/15-san-francisco',
)
def parse(self, response):
# deals = response.xpath('//li')
for deal in response.xpath('//li/a//h2'):
item = CoderItem()
item['title'] = deal.xpath('text()').extract_first()
yield item
It works just fine but the problem is when I change into
for deal in response.xpath('//li'):
item = CoderItem()
item['title'] = deal.xpath('a//h2/text()').extract_first()
yield item
this, it returns none! Is not that supposed to be same ?
Upvotes: 1
Views: 50
Reputation: 21446
The issue here is that some nodes from response.xpath("//li")
don't have any a
nodes underneath them so you get empty item since title is not there.
What you can do is use this xpath instead:
items = response.xpath('//li[a//h2/text()]')
len(items)
# 1019
titles = [i.xpath("a//h2/text()").extract_first() for i in items]
len([t for t in titles if t])
# 1019
As you can see now every item node has an item.
Upvotes: 2