andrewr
andrewr

Reputation: 815

Write Multiple API Calls to Individual JSON Files

I'm trying to query an API for some data, but my queries can be very long and cause the server to not send back data (414 request-uri too large). As such, I am creating batches to send multiple calls with the intent of saving responses from each call as json, then read them into pandas down the line to do further analysis/manipulation. My queries are being constructed as expected and the API is returning the data requested when sent in batches; however, when I go to write to file not all of the data is being written and when it is it's the same data that was already written.

My code so far is below. I can't easily tell where I'm going wrong. Is there something I should be doing differently (or better)? Below my code is an example of a response from the API.

import pandas as pd
import requests
import json
import glob
import yaml

conf = "config.yml"

with open(conf) as f:
    config = yaml.safe_load(f)

# Certs to access API
cert = config['cert'] 
key = config['key']

# Data to append to API url
alist = ['123', '456', '789', 'abc', 'def', 'xyz', '123abc', 'input1, input2']

# URL too long, send data in batches

# Create batchs to send API requests
num_batches = 4
batch_size = int(len(alist)/num_batches)
batches = []

for i in range(0, len(alist), batch_size): 
    batches.append(alist[i:i + batch_size])

urlprefix = "https://test_url.com/"
urlsuffix = "=json?url_suffix"

# API call
for batch in batches:
    APIquery = ",".join(batch)
    url = urlprefix+APIquery+urlsuffix
    print(url)

    response = requests.get(url, data=json.dumps(url), cert=(cert,key))
    jsonResponse = response.json()
    print(jsonResponse)
    
    # Write data from each batch to json file
    for i in range(0,num_batches):
        with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
            json.dumps(jsonResponse, f, indent=4)

df = pd.concat(map(pd.read_json, glob.glob('data/output/*.json')))
df.head()

Example Response:

[
    {
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
    },
    {
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
        "some attribute":"some value"
    },
]

Upvotes: 0

Views: 1027

Answers (1)

Chaos_Is_Harmony
Chaos_Is_Harmony

Reputation: 506

For one, it appears as though you're writing the same response 4 times:

# Write data from each batch to json file
for i in range(0,num_batches):
   with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+i+".json") as f:
       json.dumps(jsonResponse, f, indent=4)

should probably be:

response_cnt = 0

for batch in batches:

    ...    

    # Write data from each batch to json file
    with open(os.makedir(os.path.dirname("data/output"), exist_ok=True)+"/output_"+response_cnt+".json") as f:
       json.dumps(jsonResponse, f, indent=4)

    response_cnt += 1

where response_cnt is a variable declared outside the for batch in batches: loop and incremented after each iteration.

Upvotes: 1

Related Questions