Reputation: 81
Let's say I have an API response as:
{
"fact": {
"UP": [{
"SCODE": "CNB",
"SNAME": "Kanpur Central"
}, {
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
}],
"MP": [{
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
}, {
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
}]
}
}
I have to convert this into a dataframe which looks like below (expected output):
fact SCODE SNAME
UP CNB Kanpur Central
UP JHS Jhansi Junction
MP BPL Bhopal Junction
MP JBP Jabalpur Junction
My effort: I tried using json_normalize() but didn't reaches to expected output:
pd.json_normalize(response).apply(pd.Series.explode)
Upvotes: 1
Views: 117
Reputation: 28644
Not as efficient as working directly within dictionaries (which the chosen answer does quite well):
data = {
"fact": {
"UP": [{
"SCODE": "CNB",
"SNAME": "Kanpur Central"
}, {
"SCODE": "JHS",
"SNAME": "Jhansi Junction"
}],
"MP": [{
"SCODE": "BPL",
"SNAME": "Bhopal Junction"
}, {
"SCODE": "JBP",
"SNAME": "Jabalpur Junction"
}]
}
}
keys = data['fact']
(pd.concat([jn(data['fact'][key]) for key in keys],
keys = keys)
.droplevel(-1)
.rename_axis(index='fact')
.reset_index()
)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
Upvotes: 1
Reputation: 1875
response
from the OP.json_normalize
works with list of dictionaries and the fact
must be included there:new_response = [{"fact": rfact, **r} for rfact in response["fact"] for r in response["fact"][rfact]]
Finally, you just need to apply the function:
final_result = pd.json_normalize(new_response)
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
Upvotes: 1
Reputation: 18416
df = pd.DataFrame.from_dict(d)
df.index.name = df.columns[0]
df = (df.append(df['fact'].explode().apply(pd.Series)).drop(columns=['fact'])
.dropna()
.reset_index()
)
OUTPUT:
fact SCODE SNAME
0 MP BPL Bhopal Junction
1 MP JBP Jabalpur Junction
2 UP CNB Kanpur Central
3 UP JHS Jhansi Junction
Upvotes: 0
Reputation: 35636
One option is to reshape with python:
df = pd.DataFrame([{'fact': k, **item}
for k, lst in response['fact'].items()
for item in lst])
fact SCODE SNAME
0 UP CNB Kanpur Central
1 UP JHS Jhansi Junction
2 MP BPL Bhopal Junction
3 MP JBP Jabalpur Junction
A pandas
option via explode
+ apply
pd.Series
:
df = (
pd.DataFrame(response)['fact']
.explode()
.apply(pd.Series)
.rename_axis('fact')
.reset_index()
)
fact SCODE SNAME
0 MP BPL Bhopal Junction
1 MP JBP Jabalpur Junction
2 UP CNB Kanpur Central
3 UP JHS Jhansi Junction
Upvotes: 5