Reputation: 995
I'm currently working on a project that requires me to pull data from a REST API online, and dump it into a local db for further processing.
The API is connected to an online invoicing platform, and I am a bit concerned about the speed of the script.
Currently my script opens a connection, builds a list of invoices. for each invoice the script then opens a connection for that specific invoice lines, which gets dumped into another list.
I am currently reading an account containing 9 invoices and a total of 15 invoicelines. This takes me 7,4 seconds to retrieve.
Can anyone help me with my code? perhaps I can speed it up a bit.
# -*- coding: utf-8 -*-
import requests
from datetime import datetime
token = "b877aff346ec0c7d238c21a6c33929c84b13a110"
def request(accessToken, url):
link = 'https://api.billysbilling.com/v2/' +str(url)
headers= {'X-Access-Token': accessToken}
data = requests.get(link, headers=headers).json()
return data
def invoiceLines(token):
inv = request(accessToken = token, url = "invoices")["invoices"]
idList = []
invoiceLinesList = []
lines = []
for r in inv:
if not r["id"] in idList:
idList.append(r["id"])
invoiceLinesList.append(request(accessToken = token,
url = str("invoiceLines?invoiceId=") +
str(r["id"]))["invoiceLines"])
for invoice in invoiceLinesList:
for line in invoice:
lines.append(line)
return [inv, lines]
start = datetime.now()
data = invoiceLines(token)
print "Time spent - " + str(datetime.now()-start)
print "Invoices - " + str(len(data[0]))
print "Invoice lines - " +str(len(data[1]))
Thanks Henrik
Upvotes: 0
Views: 2477
Reputation: 12310
Use a Requests session, which will automatically reuse one connection for multiple requests. For example:
class BillingAPI(object):
def __init__(self, token, root_url='https://api.billysbilling.com/v2/'):
self._session = requests.Session()
self._session.headers['X-Access-Token'] = token
self.root_url = root_url
def get(self, url_part):
url = self.root_url + str(url_part)
return self._session.get(url).json()
def invoiceLines(token):
api = BillingAPI(token)
inv = api.get('invoices')['invoices']
# ...
You can also try requesting invoice lines for different invoices in parallel. The requests-futures extension can help with that. But please be considerate in order to not overload the server.
Other than this, you will probably be limited by the server performance of the API you’re using.
Consider if you can employ a general optimization such as caching. The CacheControl library provides an easy HTTP cache for Requests, but it is unlikely that this API supports HTTP caching, so you will need to roll your own.
Upvotes: 1