Reputation: 161
I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.
My list of nested dictionaries is the following:
my_list = [{0: {'a': '23', 'b': '15', 'c': '5', 'd': '-1'},
1: {'a': '5', 'b': '6', 'c': '7', 'd': '9'},
2: {'a': '9', 'b': '15', 'c': '5', 'd': '7'}},
{0: {'a': '5', 'b': '249', 'c': '92', 'd': '-4'},
1: {'a': '51', 'b': '5', 'c': '34', 'd': '1'},
2: {'a': '3', 'b': '8', 'c': '3', 'd': '11'}}]
So each key in the main dictionaries has 3 values.
Putting these into a dataframe using data = pd.DataFrame(my_list)
returns something unusable, as each cell has information on a, b, c and d in it.
I want to end up with a dataframe that looks like this:
name| a | b | c | d
0 | 23 | 15 | 5 | -1
1 | 5 | 6 | 7 | 9
2 | 9 | 15 | 5 | 7
0 | 5 |249 | 92| -4
1 |51 | 5 | 34| 1
2 | 3 | 8 | 3 | 11
Is this possible?
Upvotes: 6
Views: 6457
Reputation: 325
[pd.DataFrame.from_dict(l, orient='index') for l in my_list]
Documentation says that if you want the keys of dictionary to be rows, so use orient='index'
.
Upvotes: 0
Reputation: 382
from pandas import DataFrame
def flat_dict(data: dict, prefix=''):
result = dict()
for key in data:
if len(prefix):
field = prefix + '_' + key
else:
field = key
if isinstance(data[key], dict):
result.update(
flat_dict(data[key], key)
)
else:
result[field] = data[key]
return result
refactor_data = map(lambda x: flat_dict(x), data)
df = DataFrame(refactor_data)
Upvotes: 0
Reputation: 95948
You can munge the list of dictionaries to be acceptable to a DataFrame constructor:
In [4]: pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
Out[4]:
a b c d name
0 23 15 5 -1 0
1 5 6 7 9 1
2 9 15 5 7 2
3 5 249 92 -4 0
4 51 5 34 1 1
5 3 8 3 11 2
In [5]: df = pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
In [6]: df.set_index('name',inplace=True)
In [7]: df
Out[7]:
a b c d
name
0 23 15 5 -1
1 5 6 7 9
2 9 15 5 7
0 5 249 92 -4
1 51 5 34 1
2 3 8 3 11
This requires relatively recent versions of Python for {'name':'something', **rest}
to work. It is merely a shorthand for the following:
In [13]: reshaped = []
...: for d in my_list:
...: for k, v in d.items():
...: new = {'name': k}
...: new.update(v)
...: reshaped.append(new)
...:
In [14]: reshaped
Out[14]:
[{'a': '23', 'b': '15', 'c': '5', 'd': '-1', 'name': 0},
{'a': '5', 'b': '6', 'c': '7', 'd': '9', 'name': 1},
{'a': '9', 'b': '15', 'c': '5', 'd': '7', 'name': 2},
{'a': '5', 'b': '249', 'c': '92', 'd': '-4', 'name': 0},
{'a': '51', 'b': '5', 'c': '34', 'd': '1', 'name': 1},
{'a': '3', 'b': '8', 'c': '3', 'd': '11', 'name': 2}]
Upvotes: 1
Reputation: 15240
Another solution:
from itertools import chain
pd.DataFrame.from_items(list(chain.from_iterable(d.iteritems() for d in my_list))).T
In my experiments, this is faster than using pd.concat
(especially when the number of "sub-dataframes" is large) at the cost of being more verbose.
Upvotes: 4