user14276694
user14276694

Reputation: 53

scrapy get data dict from json dictionary

I am trying to get all of the data stored in this json

as a dictionary that I can load and access. I am still new to writing spiders, but I believe I need something like response.xpath().extract() and then json.load().split() to get an element from it. But the exact syntax I am not sure of, since there are so many elements in this file.

Upvotes: 0

Views: 188

Answers (2)

gangabass
gangabass

Reputation: 10666

You can use re_first() to extract JSON from JavaScript code and next loads() it using json module:

import json
d = response.xpath('//script[contains(., "windows.PAGE_MODEL")]/text()').re_first(r'(?s)windows.PAGE_MODEL = (.+?\});')
data = json.loads(d)
property_id = data['propertyData']['id']

Upvotes: 2

Patrick Klein
Patrick Klein

Reputation: 1191

You're right, it pretty much works like you suggested in your question.
You can check the script tags for 'windows.PAGE_MODEL' with a simple xpath query.
Please try the following code in the callback for your request:

d = response.xpath('//script[text()[contains(., "windows.PAGE_MODEL")]]/text()').get()
from json import loads
data = loads(d)        

Upvotes: 0

Related Questions