Reputation: 2592
I have the following code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
out[name[:-1]] = x
else:
out[name[:-1]] = x
flatten(y)
return out
def importdata(data):
responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
return responsedata
def generatejson(response):
# Generate flat json file
sample_object = pd.DataFrame(response.json())['results'].to_dict()
flat = {k: flat_json(v) for k, v in sample_object.items()}
return json.dumps(flat, sort_keys=True)
response = importdata(data)
flat_json = generatejson(response)
Sample of what importdata(data)
returns:
https://textuploader.com/dz30p
This code send get request to API get the result parse them and generate a JSON file.
This works great.
Now, I want to modify the importdata
function to support pagination (Multiple calls that are merged together).
So I wrote this code:
def impordatatnew():
...
is_valid = True
value_offset = 0
value_limit = 100
datarALL = []
while is_valid:
is_valid = False
urlApi = 'http://....?offset={1}&limit={2}&startDate={0}'.format(
requestedDate,value_offset,value_limit)
responsedata = requests.get(urlApi, data=data, headers=hed, verify=False)
if responsedata.status_code == 200: # Use status code to check request status, 200 for successful call
responsedata = responsedata.text
value_offset = value_offset + value_limit
# to do: merge the result of the get request
jsondata = json.loads(responsedata)
if "results" in jsondata:
if jsondata["results"]:
is_valid = True
if is_valid:
# concat array by + operand
datarALL = datarALL + jsondata["results"]
else:
#TODO handle other codes
print responsedata.status_code
return datarALL
This code using pagination. It connects to API gets results page by page and combine them together into a list. If I do:
print json.dumps(datarALL)
I see the combined JSON so this works great.
Example for the dump:
https://jsonblob.com/707ead1c-9891-11e8-b651-496f6b276e89
Sample for return datarALL
:
https://textuploader.com/dz39d
My Problem:
I can't seems to make the return value of impordatatnew()
to work with generatejson()
. How can I make the return value of impordatatnew()
compatible with
generatejson()
? I tried to modify as follows:
def generatejsonnew(response):
#Generate flat json file
sample_object = pd.DataFrame(response.json()).to_dict()
flat = {k: flat_json(v) for k, v in sample_object.items()}
return json.dumps(flat, sort_keys=True)
It gives:
sample_object = pd.DataFrame(response.json()).to_dict() AttributeError: 'list' object has no attribute 'json' I understand that but I don't know how to solve this. I can't seems to make this conversion works.
Upvotes: 2
Views: 6088
Reputation: 249384
It's not working because you do this:
responsedata = responsedata.text
jsondata = json.loads(responsedata)
datarALL = datarALL + jsondata["results"]
It seems like what you're doing here is to incrementally build a list. You could simplify it to:
dataALL += responsedata.json()
The problem comes later:
pd.DataFrame(response.json())
This is because you are calling json()
again on something which has already been parsed from JSON to a Python list. Hence the error message.
But the real head-scratcher is why you're doing this:
sample_object = pd.DataFrame(response.json()).to_dict()
Which isn't really "using Pandas" other than to reformulate a list into a dict. Surely there is a more direct way of doing that, such as using a for
loop to build the dict (exactly how, we can't tell without sample data).
Anyway, if you want to populate a DataFrame, simply remove the .json()
part and it should work similarly to your original non-paginating code.
But the far more efficient way to simply construct a DataFrame per page using your original code, and then call pd.concat(pages)
where pages
is the list of those DataFrames. No need to build dataALL
then.
Ultimately your code can be simplified so much more, to end up like this:
pd.concat(pd.read_json(url, ...) for url in all_page_urls)
That is, first you use a for
loop to build all_page_urls
, then you use the above one-liner to collect all the data into a single DataFrame.
Ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html#pandas.read_json
Upvotes: 1