Reputation: 45
I am trying to get a large data into python using API. But I am not being able to get the entire data. The request is allowing only first 1000 lines to be retrieved.
r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")
json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape
Output is
(1000,22)
The website contains almost 6 million data points. Yet only 1000 are retrieved. How do I get around this? Is chunking right option? Can someone please help me with the code?
Thanks.
Upvotes: 0
Views: 2757
Reputation: 1788
You'll need to paginate through the results to get the entire dataset. Most APIs will limit the amount of results returned in a single request. According to the Socrata docs you need to add $limit
and $offset
parameters to the request url.
For example, for the first page of results you would start with -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0
Then for the next page you would just increment the offset -
https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000
Continue incrementing until you have the entire dataset.
Upvotes: 1