Unable to loop through paged API responses with Python

Question

So, i'm scratching my head with this one. Using HubSpot's API, i need to get a list of ALL the companies in my client's "portal" (account). Sadly, the standard API call only returns 100 companies at a time. When it does return a response, it includes two parameters which make paging through responses possible.

One of those is "has-more": True (this lets you know if you can expect any more pages) and the other is "offset":12345678 (the timestamp to offset the request by.)

These two parameters are things you can pass back into the next API call to get the next page. So for example, the initial API call might look like:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)

Whereas the follow up calls might look like:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)

So this is what i've tried so far:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os.path
import requests
import json
import csv
import glob2
import shutil
import time
import time as howLong
from time import sleep
from time import gmtime, strftime

HubSpot_Customer_Portal_ID = "XXXXXX"

wta_hubspot_api_key = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

findCSV = glob2.glob('*contact*.csv')

theDate = time=strftime("%Y-%m-%d", gmtime())
theTime = time=strftime("%H:%M:%S", gmtime())

try:
    testData = findCSV[0]
except IndexError:
    print ("
Syncronisation attempted on {date} at {time}: There are no "contact" CSVs, please upload one and try again.
").format(date=theDate, time=theTime)
    print("====================================================================================================================
")
    sys.exit()

for theCSV in findCSV:

    def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            offset = create_get_recent_companies_response.json()[u'offset']
            hasMore = create_get_recent_companies_response.json()[u'has-more']

            while hasMore == True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    get_more_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    get_more_companies_call_response = requests.get(get_more_companies_call, headers=headers)
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))


        else:
            print("Something went wrong, check the supplied field values.
")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

    if __name__ == "__main__":
        get_companies()
        sys.exit()

The problem is that it just keeps returning the same intitial 100 results; this is happening because the parameter "has-more":True is true on the initial call, so it'll just keep returning the same ones...

My ideal scenario is that I'm able to parse ALL the companies across approximately 120 response pages (there are around 12000 companies). As I pass through each page, i'd like to append it's JSON content to a list, so that eventually I have this list which contains the JSON responses of all 120 pages, so that I can parse that list for use in a different function.

I am in desperate need of a solution :(

This is the function I am replacing in my main script:

            def get_companies():

                create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/recent/modified?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
                headers = {'content-type': 'application/json'}
                create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
                if create_get_recent_companies_response.status_code == 200:

                    for i in create_get_recent_companies_response.json()[u'results']:
                        company_name = i[u'properties'][u'name'][u'value']
                        #print(company_name)
                        if row[0].lower() == str(company_name).lower():
                            contact_company_id = i[u'companyId']
                            #print(contact_company_id)
                            return contact_company_id
                else:
                    print("Something went wrong, check the supplied field values.
")
                    #print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

SiHa · Accepted Answer

The problem seems to be that:

You get the offset in your first call, but don't do anything with the actual companies data that this call returns.
You then use this same offset in your while loop; you never use the new one from subsequent calls. This is why you get the same companies every time.

I think this code for get_companies() should work for you. I can't test it, obviously, but hopefully it is OK:

def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            while True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))
                offset = create_get_recent_companies_response.json()[u'offset']
                hasMore = create_get_recent_companies_response.json()[u'has-more']
                if not hasMore:
                    break
                else:
                    create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)


        else:
            print("Something went wrong, check the supplied field values.
")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

Strictly, the else after the break isn't required but it is in keeping with the Zen of Python "Explicit is better than implicit"

Note that you are only checking for a 200 response code once, if something goes wrong inside your loop you will miss it. You should probably put all your calls inside the loop and check for a proper response every time.

Unable to loop through paged API responses with Python

Answers (1)

Related Questions