SjAnupa
SjAnupa

Reputation: 132

How get larger amount of data from Stack Exchange API?

The Stack Exchange API returns only 30 items per request. I used a for loop to call the stack Exchange API like given below to get 4500 records.

import requests
complete_data=[]
for i in range (150):
    response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow")
    newData=json.loads(response.text)
    for item in newData['items']:
        complete_data.append(item)

But while analyzing the questions I got from the API, there was same data sets which was received 150 times. So I have received same data set for each data request in the code. I need to have near 5000 records to analyze data. Can anyone show me what changes should I do in my code?

Upvotes: 0

Views: 958

Answers (1)

double-beep
double-beep

Reputation: 5504

You're actually fetching 30 items per request and the same page (the first one). Define pagesize (max 100, min 1) and page (i + 1) in order to solve the problem:

import requests
import time

complete_data=[]
for i in range (45):
    response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow&pagesize=100&page=" + str(i + 1))
    newData=json.loads(response.text)
    for item in newData['items']:
        complete_data.append(item)
    print("Processed page " + str(i + 1) + ", returned " + str(response))
    time.sleep(2) # timeout not to be rate-limited

Notes:

  • Timeout for 2 seconds added to prevent rate-limiting.
  • You may want to obtain an API key to increase your quota from 300 to 10000.

Upvotes: 3

Related Questions