Python xpath selector getting error

Question

I want to scrape some data from this website, my spider code is:

# -*- coding: utf-8 -*-
import scrapy
from coder.items import CoderItem
# from scrapy.loader import ItemLoader


class LivingsocialSpider(scrapy.Spider):
    name = "livingsocial"
    allowed_domains = ["livingsocial.com"]
    start_urls = (
        'http://www.livingsocial.com/cities/15-san-francisco',
    )

    def parse(self, response):
        # deals = response.xpath('//li')
        for deal in response.xpath('//li/a//h2'):
            item = CoderItem()
            item['title'] = deal.xpath('text()').extract_first()
            yield item

It works just fine but the problem is when I change into

for deal in response.xpath('//li'):
    item = CoderItem()
    item['title'] = deal.xpath('a//h2/text()').extract_first()
    yield item

this, it returns none! Is not that supposed to be same ?

Granitosaurus · Accepted Answer

The issue here is that some nodes from response.xpath("//li") don't have any a nodes underneath them so you get empty item since title is not there.

What you can do is use this xpath instead:

items = response.xpath('//li[a//h2/text()]')
len(items)
# 1019
titles = [i.xpath("a//h2/text()").extract_first() for i in items]
len([t for t in titles if t])
# 1019

As you can see now every item node has an item.

Python xpath selector getting error

Answers (1)

Related Questions