Getting pandas dataframe from nested dictionaries?

Question

I am new to Python and I have not been able to find a good answer for my problem after looking for a while. I am trying to create a Pandas dataframe from a list of dictionaries.

My list of nested dictionaries is the following:

{'category_1': [{'a': '151',
   'b': '116',
   'c': '86'}],
 'category_2': [{'d': '201',
   'e': '211',
   'f': '252'},
  {'d': '-1',
   'e': '-9',
   'f': '-7'}],
 'category_3': {'g': 'Valid',
   'h': None,
   'i': False,
   'j': False},
 'category_4': {'k': None,
   'l': None,
   'm': None,
   'n': None}}

And my output should be

    a     b    c    d       e       f       g      h     i      j         k    l    m     n  
 0  151   116  86   201,-1  211,-9  252,-7  valid None  False False  None None  None None

What i tried, I'm able to do category 1,3,4 but couldn't figure out the 2nd category I tried concat and for nested loop to get it

ex=pd.concat([pd.Series(d) for d in (eg1)], axis=1).T

Then mergiting it. As i said, couldn't figure out in the whole!

James Dellinger · Accepted Answer

I wrote a short recursive function that returns a series, or a concatenation of several series if one of the keys in your dict (e.g category_2) contains a list of multiple dicts.

def expand(x):
    if type(x) == dict:
        return pd.Series(x)
    elif type(x) == list:
        return pd.concat([expand(i) for i in x])

If I start with the dictionary that you pasted in in your example above:

d = {'category_1': [{'a': '151',
   'b': '116',
   'c': '86'}],
 'category_2': [{'d': '201',
   'e': '211',
   'f': '252'},
  {'d': '-1',
   'e': '-9',
   'f': '-7'}],
 'category_3': {'g': 'Valid',
   'h': None,
   'i': False,
   'j': False},
 'category_4': {'k': None,
   'l': None,
   'm': None,
   'n': None}}

Then it's just a matter of concatenating all the series created by the recursive method I wrote:

output = pd.concat([expand(value) for key, value in d.items()])

And merging any duplicate indices so that their items appear in one row and are separated by commas. I also reshape the series into a df with one row and several columns:

output = pd.DataFrame(output.groupby(output.index).apply(lambda x: ','.join(x.astype(str)))).T

This results in a dataframe that matches your desired output:

output

    a    b    c   d        e        f       g      h     i        j        k     l     m     n
0   151  116  86  201,-1   211,-9   252,-7  Valid  None  Invalid  Invalid  None  None  None  None

Getting pandas dataframe from nested dictionaries?

Answers (2)

Related Questions