Reputation: 815
I'm trying to query an API for some data, but my queries can be very long and cause the server to not send back data (414 request-uri too large
). As such, I am creating batches to send multiple calls with the intent of saving responses from each call as json, then read them into pandas
down the line to do further analysis/manipulation. My queries are being constructed as expected and the API is returning the data requested when sent in batches; however, when I go to write to file not all of the data is being written and when it is it's the same data that was already written.
My code so far is below. I can't easily tell where I'm going wrong. Is there something I should be doing differently (or better)? Below my code is an example of a response from the API.
import pandas as pd
import requests
import json
import glob
import yaml
conf = "config.yml"
with open(conf) as f:
config = yaml.safe_load(f)
# Certs to access API
cert = config['cert']
key = config['key']
# Data to append to API url
alist = ['123', '456', '789', 'abc', 'def', 'xyz', '123abc', 'input1, input2']
# URL too long, send data in batches
# Create batchs to send API requests
num_batches = 4
batch_size = int(len(alist)/num_batches)
batches = []
for i in range(0, len(alist), batch_size):
batches.append(alist[i:i + batch_size])
urlprefix = "https://test_url.com/"
urlsuffix = "=json?url_suffix"
# API call
for batch in batches:
APIquery = ",".join(batch)
url = urlprefix+APIquery+urlsuffix
print(url)
response = requests.get(url, data=json.dumps(url), cert=(cert,key))
jsonResponse = response.json()
print(jsonResponse)
# Write data from each batch to json file
for i in range(0,num_batches):
with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
json.dumps(jsonResponse, f, indent=4)
df = pd.concat(map(pd.read_json, glob.glob('data/output/*.json')))
df.head()
Example Response:
[
{
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
},
{
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
"some attribute":"some value"
},
]
Upvotes: 0
Views: 1027
Reputation: 506
For one, it appears as though you're writing the same response 4 times:
# Write data from each batch to json file
for i in range(0,num_batches):
with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
json.dumps(jsonResponse, f, indent=4)
should probably be:
response_cnt = 0
for batch in batches:
...
# Write data from each batch to json file
with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+response_cnt+".json") as f:
json.dumps(jsonResponse, f, indent=4)
response_cnt += 1
where response_cnt
is a variable declared outside the for batch in batches:
loop and incremented after each iteration.
Upvotes: 1