Reputation: 21
I am using scrapy to scrape jobs data from this website. One job page looks like this. The static data can be easily scraped by scrapy but the dynamic data generated by google maps apis like the "Distance" and "Time" are giving me problem. I get "Distance Unknown" value for the distance field and blank value for the time field.
When I open the console in chrome, then in networks tab in the scripts section, I can see a java request ("DirectionsService.Route") that has been made to googles maps api and all the values that I need are there in a JSON format.
Is there a way in which I can use scrapy to get this json output generated by google maps api's ?
If not, then is there a way to program scrapy script to wait for the complete page load ( so that distance and time values load ) and then scrape these values ?
Upvotes: 0
Views: 2131
Reputation: 439
The issue is that scrapy does not render javascript and the Distance and Time fields are both populated by javascript.
You have a few options. You can use Splash (http://splash.readthedocs.org/en/latest/index.html) made by the same folks as Scrapy or selenium/phantomjs.
selenium with scrapy for dynamic page has lots of links/info in the answer.
As for JSON/scrapy, you can use the json library in python (import json) to load json into a python dictionary like:
json_url = 'http://www.whatever.com/whatever.json'
yield Request(json_url, callback=self.parse_json)
def parse_json(self, response):
json_dict = json.loads(response.body_as_unicode())
If the URL you yielded returns JSON, the data will now be in a python dictionary called json_dict.
Upvotes: 2