RoshanShah22
RoshanShah22

Reputation: 420

How to paginate API with cursor?

Sample JSON response

{
    "Data": {
        "City": [
            {
                "loc": "Sector XYZ",
                "Country": "AUS",
            },
            
            {
            .
            .
            .
            .
            .
            },
        ]
    },
    "Meta": {},
    "ResourceType": 40,
    "StatusCode": 200,
    "Message": null,
    "Cursor": "apicursor-ad39609e-5fb2-4a66-9402-6def95e75655",
    ]
}

The cursor is dynamic and will change after each paginated response; the next one might be "apicursor-53ee8993-022c-41df-8be7-9bdedfd91e52" and so on..

The New URL will have the below format

https://myurl123.com/api/V2/data/{}?size=10&cursor=apicursor-53ee8993-022c-41df-8be7-9bdedfd91e52

I cannot determine how to paginate responses and append it to a dataframe for very large datasets. Here is what I tried but this does not include pagination.

def foo(name):
    url = "https://myurl123.com/api/V2/data/{}?size=10".format(name)
    print(url)
    headers = {
    'Authorization': 'ApiKey xyz123',
    'Content-Type': 'application/json'
    }

    response = requests.request("GET", url, headers=headers, data=payload)
    try:
        x = response.json()
        xs = next(iter(x['Data'].values()))
        df = pd.read_json(StringIO(json.dumps(xs)), orient='records')
        df.reset_index(drop=True, inplace=True)
        return df
    except:
        print('fetch failed')

I just want to paginate API and get all data inside a df, and return that as a part of the function above.

I could not understand some of the other answers available here, so I'd like to apologize for any duplication. Thanks for your help and suggestions.

Upvotes: 0

Views: 3604

Answers (1)

C14L
C14L

Reputation: 12548

Did I understand correctly that you need to read the API again and again until you don't get back any data anymore? You could do it like this. The function get_data() would just return all the lines of all the requests as one Iterator. From the calling function, that would just look like one long list.

But that will take a long time for 100,000 lines. Because it will read 10 lines per request, so that is 10,000 requests one after the other.

def get_data(name):
    csr = ""
    baseurl = "https://myurl123.com/api/V2/data/{}".format(name)
    headers = {
        'Authorization': 'ApiKey xyz123',
        'Content-Type': 'application/json'
    }

    while True:
        url = "{}?size=10&cursor={}".format(baseurl, csr)
        res = requests.request("GET", url, headers=headers, data=payload)
        res.raise_for_status()
        data = res.json()
        if not data["Data"]:
            break
        crs = data["Cursor"]

        for row in data["Data"]["City"]:
            yield row

def get_df(name):
    data = get_data(name)

    df = pd.read_json(StringIO(json.dumps(list(data))), orient='records')
    df.reset_index(drop=True, inplace=True)

    return df

Upvotes: 4

Related Questions