eroma934
eroma934

Reputation: 339

JSON Response and Scrapy

I'm trying to parse a JSON response from the New York Times API with Scrapy to CSV so that I could have a summary of all related articles to a particular query. I'd like to spit this out as a CSV with link, publication date, summary, and title so that I could run a few keyword searches on the summary description. I'm new to both Python and Scrapy but here's my spider (I'm getting an HTTP 400 error). I've xx'ed out my api key in the spider:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from nytimesAPIjson.items import NytimesapijsonItem
import json
import urllib2

class MySpider(BaseSpider):
    name = "nytimesapijson"
    allowed_domains = ["http://api.nytimes.com/svc/search/v2/articlesearch"]
    req = urllib2.urlopen('http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx)

      def json_parse(self, response):
          jsonresponse= json.loads(response)

          item = NytimesapijsonItem()
          item ["pubDate"] = jsonresponse["pub_date"]
          item ["description"] = jsonresponse["lead_paragraph"]
          item ["title"] = jsonresponse["print_headline"]
          item ["link"] = jsonresponse["web_url"]
          items.append(item)
          return items

If anybody has any ideas/suggestions, including onese outside of Scrapy, please let me know. Thanks in advance.

Upvotes: 1

Views: 3087

Answers (1)

alecxe
alecxe

Reputation: 473763

You should set start_urls and use parse method:

from scrapy.spider import BaseSpider
import json


class MySpider(BaseSpider):
    name = "nytimesapijson"
    allowed_domains = ["api.nytimes.com"]
    start_urls = ['http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx']

    def parse(self, response):
        jsonresponse = json.loads(response)

        print jsonresponse

Upvotes: 2

Related Questions