Reputation: 185
I'm using scrapy and I'm trying to test my selector using scrapy shell but nothing is working. I'm trying to scrape the JSON data on this website.
I've tried to scrape the data using the selector
response.css("body > pre::text").extract()
However, this doesn't seem to be working. Not sure what's wrong...
Ideally, I just want to get all the "Name: XXX" elements from the JSON data. So If you know how to select those specifically, that would be very helpful as well!
Currently my code looks like this
# -*- coding: utf-8 -*-
import scrapy # needed to scrape
import sys # need to import xlrd
sys.path.extend("/Users/YoungFreeesh/anaconda3/lib/python3.6/site-
packages/") # needed to import xlrd
import xlrd # used to easily import xlsx file
class AmazonbotSpider(scrapy.Spider):
name = 'ArchiveSpider'
allowed_domains = ['web.archive.org']
start_urls =['https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true']
def parse(self, response):
print(response.body)
Upvotes: 1
Views: 1073
Reputation: 222802
Since the content is inside an iframe
, it is a separate page, you have to navigate to the iframe first. Like a link, something like that:
urls = response.css('iframe::attr(src)').extract()
for url in urls :
yield scrapy.Request(url...., target=parse_iframe)
then define a new parse_iframe
method where you parse the iframes response.
Upvotes: 1