Reputation: 876
This is a list of dictionaries that I have which is to be converted to a dataframe. I tried using multi-index but couldn't convert the whole dataframe.
response = [{
"name": "xyz",
"empId": "007",
"details": [{
"address": [{
"street": "x street",
"city": "x city"
}, {
"street": "xx street",
"city": "xx city"
}],
"country": "xxz country"
},
{
"address": [{
"street": "y street",
"city": "y city"
}, {
"street": "yy street",
"city": "yy city"
}],
"country": "yyz country"
}
]
}]
I managed to get the inner list of dictionaries to a dataframe with the following code:
for i in details:
Country = i['country']
street =[]
city = []
index = pd.MultiIndex.from_arrays([[Country]*len(i['address']), list(range(1,len(i['address'])+1))], names=['Country', 'SL No'])
df=pd.DataFrame(columns=["Street","City"],index=index)
if i['address']:
for row in i['address']:
street.append(row['street'])
city.append(row['city'])
df["Street"]=street
df["City"]=city
frames.append(df)
df_final=pd.concat(frames)
Output obtained:
Country SL No Street City
xxz country 1 x street x city
2 xx street xx city
yyz country 1 y street y city
2 yy street yy city
How can I convert the list of dictionaries to a dataframe while keeping all the information?
The final output that I want:
Name EmpId Country Street City
xyz 007 xxz country x street x city
xx street xx city
yyz country y street y city
yy street yy cit
Upvotes: 3
Views: 165
Reputation: 863741
Use json_normalize
with DataFrame.set_index
:
df = pd.json_normalize(response,
record_path=['details','address'],
meta=['name','empId', ['address','country']]
)
df = df.set_index(['name','empId','address.country'])
print (df)
street city
name empId address.country
xyz 007 xxz country x street x city
xxz country xx street xx city
yyz country y street y city
yyz country yy street yy city
For older pandas versions use:
df = pd.io.json.json_normalize(response,
record_path=['details','address'],
meta=['name','empId', ['address','country']]
)
EDIT:
Tested with multiple values and working well:
response = [{
"name": "xyz",
"empId": "007",
"details": [{
"address": [{
"street": "x street",
"city": "x city"
}, {
"street": "xx street",
"city": "xx city"
}],
"country": "xxz country"
},
{
"address": [{
"street": "y street",
"city": "y city"
}, {
"street": "yy street",
"city": "yy city"
}],
"country": "yyz country"
}
]
},
{
"name": "xyz1",
"empId": "0071",
"details": [{
"address": [{
"street": "x street1",
"city": "x city1"
}, {
"street": "xx stree1t",
"city": "xx city1"
}],
"country": "xxz country"
},
{
"address": [{
"street": "y street",
"city": "y city"
}, {
"street": "yy street",
"city": "yy city"
}],
"country": "yyz country"
}
]
}]
df = pd.json_normalize(response,
record_path=['details','address'],
meta=['name','empId', ['address','country']]
)
df = df.set_index(['name','empId','address.country'])
print (df)
street city
name empId address.country
xyz 007 xxz country x street x city
xxz country xx street xx city
yyz country y street y city
yyz country yy street yy city
xyz1 0071 xxz country x street1 x city1
xxz country xx stree1t xx city1
yyz country y street y city
yyz country yy street yy city
Upvotes: 3
Reputation: 28422
As far as I know, there is no easy way to do it since your data contains multiple levels of lists. Although a bit convoluted, the following should work. The code will iteratively explode
lists and convert dictionaries to columns with json_normalize
.
df = pd.DataFrame.from_records(response)
df = df.explode('details', ignore_index=True)
df = pd.concat([df, pd.json_normalize(df['details'])], axis=1)
df = df.explode('address', ignore_index=True)
df = pd.concat([df, pd.json_normalize(df['address'])], axis=1)
df = df.drop(columns=['details', 'address'])
Result:
name empId country street city
0 xyz 007 xxz country x street x city
1 xyz 007 xxz country xx street xx city
2 xyz 007 yyz country y street y city
3 xyz 007 yyz country yy street yy city
Note: For pandas versions older than 1.1.0, explode
do not have the ignore_index
parameter. Instead, use reset_index(drop=True)
after the explode
.
In addition, in older pandas versions you need to use pd.io.json.json_normalize
instead of pd.json_normalize
.
Upvotes: 0