Reputation: 53
I am trying to get all of the data stored in this json
as a dictionary that I can load and access. I am still new to writing spiders, but I believe I need something like response.xpath().extract() and then json.load().split() to get an element from it. But the exact syntax I am not sure of, since there are so many elements in this file.
Upvotes: 0
Views: 188
Reputation: 10666
You can use re_first()
to extract JSON from JavaScript code and next loads()
it using json
module:
import json
d = response.xpath('//script[contains(., "windows.PAGE_MODEL")]/text()').re_first(r'(?s)windows.PAGE_MODEL = (.+?\});')
data = json.loads(d)
property_id = data['propertyData']['id']
Upvotes: 2
Reputation: 1191
You're right, it pretty much works like you suggested in your question.
You can check the script tags for 'windows.PAGE_MODEL' with a simple xpath query.
Please try the following code in the callback for your request:
d = response.xpath('//script[text()[contains(., "windows.PAGE_MODEL")]]/text()').get()
from json import loads
data = loads(d)
Upvotes: 0