Reputation: 165
I have some data containing nested dictionaries like below:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
If we convert it to pandas DataFrame,
import pandas as pd
result_dataframe = pd.DataFrame(mylist)
print(result_dataframe)
It will output:
a b
0 1 {'c': 2, 'd': 3}
1 3 {'c': 4, 'd': 3}
I want to convert the list of dictionaries and ignore the key of the nested dictionary. My code is below:
new_dataframe = result_dataframe.drop(columns=["b"])
b_dict_list = [document["b"] for document in mylist]
b_df = pd.DataFrame(b_dict_list)
frames = [new_dataframe, b_df]
total_frame = pd.concat(frames, axis=1)
The total_frame is which I want:
a c d
0 1 2 3
1 3 4 3
But I think my code is a little complicated. Is there any simple way to deal with this problem? Thank you.
Upvotes: 6
Views: 18097
Reputation: 311
I had a similar problem to this one. I used pd.json_normalize(x)
and it worked. The only difference is that the column names of the data frame will look a little different.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
df = pd.json_normalize(mylist)
print(df)
Output:
a | b.c | b.d | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 3 | 4 | 3 |
Upvotes: 15
Reputation: 3211
I prefer to write a function that accepts your mylist
and converts it 1 nested layer down and returns a dictionary. This has the added advantage of not requiring you to 'manually' know what key like b
to convert. So this function works for all nested keys 1 layer down.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
import pandas as pd
def dropnested(alist):
outputdict = {}
for dic in alist:
for key, value in dic.items():
if isinstance(value, dict):
for k2, v2, in value.items():
outputdict[k2] = outputdict.get(k2, []) + [v2]
else:
outputdict[key] = outputdict.get(key, []) + [value]
return outputdict
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d
#0 1 2 3
#1 3 4 3
If you try:
mylist = [{"a": 1, "b": {"c": 2, "d":3}, "g": {"e": 2, "f":3}},
{"a": 3, "z": {"c": 4, "d":3}, "e": {"e": 2, "f":3}}]
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d e f
#0 1 2 3 2 3
#1 3 4 3 2 3
We can see here that it converts keys b
,g
,z
,e
without issue, as opposed to having to define each and every nested key name to convert
Upvotes: 2
Reputation: 862521
Use dict comprehension
with pop
for extract value b
and merge dictionaries:
a = [{**x, **x.pop('b')} for x in mylist]
print (a)
[{'a': 1, 'c': 2, 'd': 3}, {'a': 3, 'c': 4, 'd': 3}]
result_dataframe = pd.DataFrame(a)
print(result_dataframe)
a c d
0 1 2 3
1 3 4 3
Another solution, thanks @Sandeep Kadapa :
a = [{'a': x['a'], **x['b']} for x in mylist]
#alternative
a = [{'a': x['a'], **x.get('b')} for x in mylist]
Upvotes: 9
Reputation: 75080
Or by applying pd.Series()
to your method:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
result_dataframe = pd.DataFrame(mylist)
result_dataframe.drop('b',1).join(result_dataframe.b.apply(pd.Series))
a c d
0 1 2 3
1 3 4 3
Upvotes: 2