Programmer120
Programmer120

Reputation: 2592

how to convert response from request.get to DataFrame?

I have the following code:

def flatten_json(y):
    out = {}
    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            out[name[:-1]] = x
        else:
            out[name[:-1]] = x
    flatten(y)
    return out 

def importdata(data):
    responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
    return responsedata


def generatejson(response):
    # Generate flat json file
    sample_object = pd.DataFrame(response.json())['results'].to_dict()
    flat = {k: flat_json(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

response = importdata(data)
flat_json = generatejson(response)

Sample of what importdata(data) returns: https://textuploader.com/dz30p

This code send get request to API get the result parse them and generate a JSON file.

This works great.

Now, I want to modify the importdata function to support pagination (Multiple calls that are merged together).

So I wrote this code:

def impordatatnew():
...
is_valid = True
value_offset = 0
value_limit = 100
datarALL = []
while is_valid:
        is_valid = False
        urlApi = 'http://....?offset={1}&limit={2}&startDate={0}'.format(
            requestedDate,value_offset,value_limit)
        responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
        if responsedata.status_code == 200:  # Use status code to check request status, 200 for successful call
            responsedata = responsedata.text   
            value_offset = value_offset + value_limit
            # to do: merge the result of the get request
            jsondata = json.loads(responsedata)
            if "results" in jsondata:
                if jsondata["results"]:
                    is_valid = True
            if is_valid:
                # concat array by + operand
                datarALL = datarALL + jsondata["results"]
        else:
            #TODO handle other codes
            print responsedata.status_code
return datarALL

This code using pagination. It connects to API gets results page by page and combine them together into a list. If I do:

print json.dumps(datarALL) I see the combined JSON so this works great. Example for the dump: https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89

Sample for return datarALL:

https://textuploader.com/dz39d

My Problem:

I can't seems to make the return value of impordatatnew() to work with generatejson(). How can I make the return value of impordatatnew() compatible with generatejson() ? I tried to modify as follows:

def generatejsonnew(response):
    #Generate flat json file
    sample_object = pd.DataFrame(response.json()).to_dict()
    flat = {k: flat_json(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

It gives:

sample_object = pd.DataFrame(response.json()).to_dict() AttributeError: 'list' object has no attribute 'json' I understand that but I don't know how to solve this. I can't seems to make this conversion works.

Upvotes: 2

Views: 6088

Answers (1)

John Zwinck
John Zwinck

Reputation: 249384

It's not working because you do this:

responsedata = responsedata.text   
jsondata = json.loads(responsedata)
datarALL = datarALL + jsondata["results"]

It seems like what you're doing here is to incrementally build a list. You could simplify it to:

dataALL += responsedata.json()

The problem comes later:

pd.DataFrame(response.json())

This is because you are calling json() again on something which has already been parsed from JSON to a Python list. Hence the error message.

But the real head-scratcher is why you're doing this:

sample_object = pd.DataFrame(response.json()).to_dict()

Which isn't really "using Pandas" other than to reformulate a list into a dict. Surely there is a more direct way of doing that, such as using a for loop to build the dict (exactly how, we can't tell without sample data).

Anyway, if you want to populate a DataFrame, simply remove the .json() part and it should work similarly to your original non-paginating code.

But the far more efficient way to simply construct a DataFrame per page using your original code, and then call pd.concat(pages) where pages is the list of those DataFrames. No need to build dataALL then.

Ultimately your code can be simplified so much more, to end up like this:

pd.concat(pd.read_json(url, ...) for url in all_page_urls)

That is, first you use a for loop to build all_page_urls, then you use the above one-liner to collect all the data into a single DataFrame.

Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html#pandas.read_json

Upvotes: 1

Related Questions