Henrik Poulsen
Henrik Poulsen

Reputation: 995

Python Rest API call speed

I'm currently working on a project that requires me to pull data from a REST API online, and dump it into a local db for further processing.

The API is connected to an online invoicing platform, and I am a bit concerned about the speed of the script.

Currently my script opens a connection, builds a list of invoices. for each invoice the script then opens a connection for that specific invoice lines, which gets dumped into another list.

I am currently reading an account containing 9 invoices and a total of 15 invoicelines. This takes me 7,4 seconds to retrieve.

Can anyone help me with my code? perhaps I can speed it up a bit.

# -*- coding: utf-8 -*-
import requests
from datetime import datetime
token = "b877aff346ec0c7d238c21a6c33929c84b13a110"

def request(accessToken, url):
    link = 'https://api.billysbilling.com/v2/' +str(url)
    headers=  {'X-Access-Token': accessToken}
    data = requests.get(link, headers=headers).json()
    return data

def invoiceLines(token):
    inv = request(accessToken = token, url = "invoices")["invoices"]
    idList = []
    invoiceLinesList = []
    lines = []
    for r in inv:
        if not r["id"] in idList:
            idList.append(r["id"])
            invoiceLinesList.append(request(accessToken = token,
                                    url = str("invoiceLines?invoiceId=") +
                                          str(r["id"]))["invoiceLines"])
    for invoice in invoiceLinesList:
        for line in invoice:
            lines.append(line)
    return [inv, lines]

start = datetime.now()
data = invoiceLines(token)
print "Time spent - " + str(datetime.now()-start)
print "Invoices - " + str(len(data[0]))
print "Invoice lines - " +str(len(data[1]))

Thanks Henrik

Upvotes: 0

Views: 2477

Answers (1)

Vasiliy Faronov
Vasiliy Faronov

Reputation: 12310

Use a Requests session, which will automatically reuse one connection for multiple requests. For example:

class BillingAPI(object):

    def __init__(self, token, root_url='https://api.billysbilling.com/v2/'):
        self._session = requests.Session()
        self._session.headers['X-Access-Token'] = token
        self.root_url = root_url

    def get(self, url_part):
        url = self.root_url + str(url_part)
        return self._session.get(url).json()


def invoiceLines(token):
    api = BillingAPI(token)
    inv = api.get('invoices')['invoices']
    # ...

You can also try requesting invoice lines for different invoices in parallel. The requests-futures extension can help with that. But please be considerate in order to not overload the server.

Other than this, you will probably be limited by the server performance of the API you’re using.

Consider if you can employ a general optimization such as caching. The CacheControl library provides an easy HTTP cache for Requests, but it is unlikely that this API supports HTTP caching, so you will need to roll your own.

Upvotes: 1

Related Questions