JSON Response and Scrapy

Question

I'm trying to parse a JSON response from the New York Times API with Scrapy to CSV so that I could have a summary of all related articles to a particular query. I'd like to spit this out as a CSV with link, publication date, summary, and title so that I could run a few keyword searches on the summary description. I'm new to both Python and Scrapy but here's my spider (I'm getting an HTTP 400 error). I've xx'ed out my api key in the spider:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from nytimesAPIjson.items import NytimesapijsonItem
import json
import urllib2

class MySpider(BaseSpider):
    name = "nytimesapijson"
    allowed_domains = ["http://api.nytimes.com/svc/search/v2/articlesearch"]
    req = urllib2.urlopen('http://api.nytimes.com/svc/search/v2/articlesearch.json?q="financial crime"&facet_field=day_of_week&begin_date=20130101&end_date=20130916&page=2&rank=newest&api-key=xxx)

      def json_parse(self, response):
          jsonresponse= json.loads(response)

          item = NytimesapijsonItem()
          item ["pubDate"] = jsonresponse["pub_date"]
          item ["description"] = jsonresponse["lead_paragraph"]
          item ["title"] = jsonresponse["print_headline"]
          item ["link"] = jsonresponse["web_url"]
          items.append(item)
          return items

If anybody has any ideas/suggestions, including onese outside of Scrapy, please let me know. Thanks in advance.

JSON Response and Scrapy

Answers (1)

Related Questions