Reputation: 75
I want to transform the result of a call from an API to a dataframe. The result of the API call is a nested dictionary, but the produced dataframe is not as I need it.
In addition to json_normalize, I tried pd.DataFrame.from_dict. However, until now had been unsuccessful. I also tried to flatten the dictionary, but nothing.
I used the following call:
[73] results = requests.get(url).json()
results
And the output was:
{'result': {'totalrows': 3124,
'rows': [{'rownum': 1,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1089387}]},
{'rownum': 2,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1086893}]},
{'rownum': 3,
'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683},
{'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'},
{'field': 'issuetitle', 'value': 'COM'},
{'field': 'filerid', 'value': 1085803}]}
Then to produce the data frame, I used the following code:
[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()
This was the output:
rownum values
0 1 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1089387}
1 2 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
{'field': 'issueid', 'value': 472683}, {'field':
'ticker', 'value': 'AAPL'}, {'field': 'companyname',
'value': 'APPLE INC'}, {'field': 'issuetitle', 'value':
'COM'}, {'field': 'filerid', 'value': 1086893}
2 3 [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field':
'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'},
{'field': 'companyname', 'value': 'APPLE INC'}, {'field':
'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}
However, I want to obtain a DataFrame with the following format:
Upvotes: 4
Views: 1348
Reputation: 22473
Naive nested for loop attempt...
import pandas as pd
df = pd.DataFrame([])
for row in json["result"]["rows"]:
rownum = row["rownum"]
querydate = issueid = ticker = companyname = issuetitle = filerid = None
for value_dict in row["values"]:
if value_dict["field"] == "querydate":
querydate = value_dict["value"]
elif value_dict["field"] == "issueid":
issueid = value_dict["value"]
elif value_dict["field"] == "ticker":
ticker = value_dict["value"]
elif value_dict["field"] == "companyname":
companyname = value_dict["value"]
elif value_dict["field"] == "filerid":
filerid = value_dict["value"]
df = df.append(pd.DataFrame({"rownum": rownum,
"querydate": querydate,
"issueid": issueid,
"ticker": ticker,
"companyname": companyname,
"issuetitle": issuetitle,
"filerid": filerid,
}, index=[0]), ignore_index=True)
print(df)
JSON object:
json = {
"result": {
"totalrows": 3,
"rows": [
{
"rownum": 1,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1089387
}
]
},
{
"rownum": 2,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1086893
}
]
},
{
"rownum": 3,
"values": [
{
"field": "querydate",
"value": "7/31/2019 3:19 PM"
},
{
"field": "issueid",
"value": 472683
},
{
"field": "ticker",
"value": "AAPL"
},
{
"field": "companyname",
"value": "APPLE INC"
},
{
"field": "issuetitle",
"value": "COM"
},
{
"field": "filerid",
"value": 1085803
}
]
}
]
}
}
Output:
rownum querydate issueid ticker companyname issuetitle filerid
0 1 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1089387
1 2 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1086893
2 3 7/31/2019 3:19 PM 472683 AAPL APPLE INC COM 1085803
Upvotes: 0
Reputation: 5500
You can use pandas.DataFrame.from_dict
but you need to remove all unnecessary data in your data. Actually, you only want to keep the field
value and value
per row. You can do it with list comprehension:
data = [{ field["field"]:field["value"] for field in row['values']
} for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1089387},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1086893},
# {
# 'querydate': '7/31/2019 3:19 PM',
# 'issueid': 472683,
# 'ticker': 'AAPL',
# 'companyname': 'APPLE INC',
# 'issuetitle': 'COM',
# 'filerid': 1085803
# }]
Once you have this dictionary, you can call from_dict
method:
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM AAPL
If you want to get the rownum
as a column (or index):
data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]
df = pd.DataFrame.from_dict(data)
print(df)
# companyname filerid issueid issuetitle querydate rownum ticker
# 0 APPLE INC 1089387 472683 COM 7/31/2019 3:19 PM 1 AAPL
# 1 APPLE INC 1086893 472683 COM 7/31/2019 3:19 PM 2 AAPL
# 2 APPLE INC 1085803 472683 COM 7/31/2019 3:19 PM 3 AAPL
Upvotes: 1