amatof
amatof

Reputation: 185

Using $limit and $offset to get more than 1,000 rows on a SODA API

I am pulling data with a SODA API using the following code in Python

response = requests.get('https://healthdata.gov/resource/uqq2-txqb.json')

The dataset contains 434,865 rows but when I use the API, it only returns the first 1,000 rows. I saw on another question that $limit can be used to get the first 50,000 rows but how would I combine this with $offset to get all 434,865 rows?

** I figured out how to use $offset and now have the resulting code, is there any way to condense this?

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')

Upvotes: 2

Views: 3186

Answers (3)

Ilia Munaev
Ilia Munaev

Reputation: 1

# Initialize variables for pagination

limit = 50000
offset = 0
data = []

while True:

    # Set query parameters
    params = {
        '$limit': limit,
        '$offset': offset
    }

    # Make a GET request to the API endpoint with the query parameters
    response = requests.get(url, headers=headers, params=params)

Upvotes: 0

sal
sal

Reputation: 3593

This is referred to as paging, and you can find documentation for example here: https://dev.socrata.com/docs/paging.html

In there, it is also specified that there are two versions of API:

  • v2.0, where $limit can be max 50,000
  • v2.1, where $limit is unlimited

The endpoint you are using seems to be support v2.1, at least based on this https://dev.socrata.com/foundry/healthdata.gov/uqq2-txqb so you should be able to use a large value for $limit and retrieve the entire set at once.

Going the paging route, the $offset value is 0-based, hence your queries should be properly rewritten as:

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=50000')
response3 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=100000')
response4 = requests.get('https://healthdata.gov/resource/uqq2xqb.json?$limit=50000&$offset=150000')

Note the alignment on multiples of $limit.

Upvotes: 2

amatof
amatof

Reputation: 185

response1 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000')
response2 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=50001')
response3 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=100002')
response4 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=150003')
response5 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=200004')
response6 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=250005')
response7 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=300006')
response8 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=350007')
response9 = requests.get('https://healthdata.gov/resource/uqq2-txqb.json?$limit=50000&$offset=400008')

Upvotes: 0

Related Questions