aliki43
aliki43

Reputation: 161

Getting pandas dataframe from list of nested dictionaries

I am new to Python so this may be pretty straightforward, but I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.

My list of nested dictionaries is the following:

my_list = [{0: {'a': '23', 'b': '15', 'c': '5', 'd': '-1'}, 
            1: {'a': '5', 'b': '6', 'c': '7', 'd': '9'}, 
            2: {'a': '9', 'b': '15', 'c': '5', 'd': '7'}}, 
           {0: {'a': '5', 'b': '249', 'c': '92', 'd': '-4'}, 
            1: {'a': '51', 'b': '5', 'c': '34', 'd': '1'}, 
            2: {'a': '3', 'b': '8', 'c': '3', 'd': '11'}}]

So each key in the main dictionaries has 3 values.

Putting these into a dataframe using data = pd.DataFrame(my_list) returns something unusable, as each cell has information on a, b, c and d in it.

I want to end up with a dataframe that looks like this:

 name| a  | b  | c | d 
0    | 23 | 15 | 5 | -1 
1    | 5  | 6  | 7 |  9 
2    | 9  | 15 | 5 |  7 
0    | 5  |249 | 92| -4 
1    |51  | 5  | 34|  1 
2    | 3  | 8  | 3 | 11 

Is this possible?

Upvotes: 6

Views: 6457

Answers (5)

AJ AJ
AJ AJ

Reputation: 325

[pd.DataFrame.from_dict(l, orient='index') for l in my_list]

Documentation says that if you want the keys of dictionary to be rows, so use orient='index'.

Upvotes: 0

from pandas import DataFrame

def flat_dict(data: dict, prefix=''):
    result = dict()
    
    for key in data:
        
        if len(prefix):
            field = prefix + '_' + key
        else:
            field = key
            
        if isinstance(data[key], dict):
            result.update(
                flat_dict(data[key], key)
            )
        else:
            result[field] = data[key]
    
    return result

refactor_data = map(lambda x: flat_dict(x), data)

df = DataFrame(refactor_data)

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95948

You can munge the list of dictionaries to be acceptable to a DataFrame constructor:

In [4]: pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])
Out[4]:
    a    b   c   d  name
0  23   15   5  -1     0
1   5    6   7   9     1
2   9   15   5   7     2
3   5  249  92  -4     0
4  51    5  34   1     1
5   3    8   3  11     2

In [5]: df = pd.DataFrame.from_records([{'name': k, **v} for d in my_list for k,v in d.items()])

In [6]: df.set_index('name',inplace=True)

In [7]: df
Out[7]:
       a    b   c   d
name
0     23   15   5  -1
1      5    6   7   9
2      9   15   5   7
0      5  249  92  -4
1     51    5  34   1
2      3    8   3  11

This requires relatively recent versions of Python for {'name':'something', **rest} to work. It is merely a shorthand for the following:

In [13]: reshaped = []
    ...: for d in my_list:
    ...:     for k, v in d.items():
    ...:         new = {'name': k}
    ...:         new.update(v)
    ...:         reshaped.append(new)
    ...:

In [14]: reshaped
Out[14]:
[{'a': '23', 'b': '15', 'c': '5', 'd': '-1', 'name': 0},
 {'a': '5', 'b': '6', 'c': '7', 'd': '9', 'name': 1},
 {'a': '9', 'b': '15', 'c': '5', 'd': '7', 'name': 2},
 {'a': '5', 'b': '249', 'c': '92', 'd': '-4', 'name': 0},
 {'a': '51', 'b': '5', 'c': '34', 'd': '1', 'name': 1},
 {'a': '3', 'b': '8', 'c': '3', 'd': '11', 'name': 2}]

Upvotes: 1

Igor Raush
Igor Raush

Reputation: 15240

Another solution:

from itertools import chain
pd.DataFrame.from_items(list(chain.from_iterable(d.iteritems() for d in my_list))).T

In my experiments, this is faster than using pd.concat (especially when the number of "sub-dataframes" is large) at the cost of being more verbose.

Upvotes: 4

DYZ
DYZ

Reputation: 57033

Easy:

pd.concat([pd.DataFrame(l) for l in my_list],axis=1).T

Upvotes: 10

Related Questions