Prathmesh
Prathmesh

Reputation: 73

Trying to parse JSON files using Scrapy

I'm trying to parse files much like this one, but for a lot of longitudes and latitudes. The crawler loops through all of the webpages, but doesn't output anything.

Here is my code:

import scrapy
import json

from tutorial.items import DmozItem
from scrapy.http import Request
from scrapy.contrib.spiders import CrawlSpider, Rule

class DmozSpider(CrawlSpider):
    name = "dmoz"
    allowed_domains = ["proadvisorservice.intuit.com"]

    min_lat = 35
    max_lat = 40
    min_long = -100
    max_long = -90

    def start_requests(self):
        for i in range(self.min_lat, self.max_lat):
            for j in range(self.min_long, self.max_long):
                yield scrapy.Request('http://proadvisorservice.intuit.com/v1/search?latitude=%d&longitude=%d&radius=100&pageNumber=1&pageSize=&sortBy=distance' % (i, j), 
                    meta={'index':(i, j)},
                    callback=self.parse)

    def parse(self, response):
        jsonresponse = json.loads(response.body_as_unicode())

        for x in jsonresponse['searchResults']:
            item = DmozItem()

            item['firstName'] = x['firstName']
            item['lastName'] = x['lastName']
            item['phoneNumber'] = x['phoneNumber']
            item['email'] = x['email']
            item['companyName'] = x['companyName']
            item['qbo'] = x['qbopapCertVersions']
            item['qbd'] = x['papCertVersions']

            yield item

Upvotes: 0

Views: 697

Answers (1)

bosnjak
bosnjak

Reputation: 8614

When using CrawlSpider you should not override the parse() method:

When writing crawl spider rules, avoid using parse as callback, since the CrawlSpider uses the parse method itself to implement its logic. So if you override the parse method, the crawl spider will no longer work. (source)

But since you are customizing your spider manually, and not using the CrawlSpider functionality anyway, I would suggest that you don't inherit from it. Instead, inherit from scrapy.Spider:

class DmozSpider(scrapy.Spider):
    ...

Upvotes: 1

Related Questions