WhiteDillPickle
WhiteDillPickle

Reputation: 185

How to Scrape JSON Data Using Scrapy

I'm using scrapy and I'm trying to test my selector using scrapy shell but nothing is working. I'm trying to scrape the JSON data on this website.

https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true

I've tried to scrape the data using the selector

   response.css("body > pre::text").extract()

However, this doesn't seem to be working. Not sure what's wrong...

Ideally, I just want to get all the "Name: XXX" elements from the JSON data. So If you know how to select those specifically, that would be very helpful as well!

Currently my code looks like this

    # -*- coding: utf-8 -*-
    import scrapy # needed to scrape
    import sys    # need to import xlrd
    sys.path.extend("/Users/YoungFreeesh/anaconda3/lib/python3.6/site- 
    packages/") # needed to import xlrd
    import xlrd   # used to easily import xlsx file 

    class AmazonbotSpider(scrapy.Spider):
        name = 'ArchiveSpider'

        allowed_domains = ['web.archive.org']
        start_urls =['https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true']

        def parse(self, response):
            print(response.body)

Upvotes: 1

Views: 1073

Answers (1)

nosklo
nosklo

Reputation: 222802

Since the content is inside an iframe, it is a separate page, you have to navigate to the iframe first. Like a link, something like that:

urls = response.css('iframe::attr(src)').extract()
for url in urls :
    yield scrapy.Request(url...., target=parse_iframe)

then define a new parse_iframe method where you parse the iframes response.

Upvotes: 1

Related Questions