Reputation: 7124
I'm trying to get a long JSON response (~75 Mbytes) from a webpage, However I can only receive the first 25 Mbytes or so.
I've used urllib2 and python-requests but neither work. I've tried reading parts in separately and streaming the data, but this doesn't work either.
An example of the data can be found here:
http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json¶meterCd=00060&period=P260W
My code is as follows:
r = requests.get("http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json¶meterCd=00060&period=P260W")
usgs_data = r.json() # script breaks here
# Save Longitude and Latitude of river
latitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["latitude"]
longitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["longitude"]
# dictionary of all past river flows in cubic feet per second
river_history = usgs_data['value']['timeSeries'][0]['values'][0]['value']
It breaks with:
ValueError: Expecting object: line 1 column 13466329 (char 13466328)
When the script tries to decode the JSON (i.e. usgs_data = r.json()
).
This is because the full data hasn't been received and is therefore not a valid JSON object.
Upvotes: 1
Views: 1603
Reputation: 87064
The problem seems to be that the server won't serve more than 13MB of data at a time.
I have tried that URL using a number of HTTP clients including curl
and wget
, and all of them bomb out at about 13MB. I have also tried enabling gzip compression (as should you), but the results were still truncated at 13MB after decompression.
You are requesting too much data because the period=P260W
specifies 260 weeks. If you try setting period=P52W
you should find that you are able to retrieve a valid JSON response.
To reduce the amount of data transferred, set the Accept-Encoding
header like this:
url = 'http://waterservices.usgs.gov/nwis/iv/'
params = {'site': 11527000, 'format': 'json', 'parameterCd': '00060', 'period': 'P52W'}
r = requests.get(url, headers={'Accept-Encoding': 'gzip,deflate'})
Upvotes: 3