Group pandas dataframe by a nested dictionary key

Question

I have a pandas dataframe where one of the columns is dictionary type. This is an example dataframe:

import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 
                   'b': [4,5,6], 
                   'version': [{'major': 7, 'minor':1}, 
                               {'major':8, 'minor': 5},
                               {'major':7, 'minor':2}] })

df:

   a  b                   version
0  1  4  {'minor': 1, 'major': 7}
1  2  5  {'minor': 5, 'major': 8}
2  3  6  {'minor': 2, 'major': 7}

I am looking for a way to group the dataframe by one of that dictionary key; in this case to group the df dataframe by the major key in version label.

I have tried a few different stuff, from passing the dictionary key to dataframe groupby function, `df.groupby(['version']['major']), which doesn't work since major is not part of dataframe label, to assigning version to the dataframe index, but nothing works so far. I'm also trying to flatten the dictionaries as additional columns in the dataframe itself, but this seems to have its own issue.

Any idea?

P.S. Sorry about formatting, it's my first stackoverflow question.

piRSquared · Accepted Answer

Option 1

df.groupby(df.version.apply(lambda x: x['major'])).size()

version
7    2
8    1
dtype: int64

df.groupby(df.version.apply(lambda x: x['major']))[['a', 'b']].sum()

Option 2

df.groupby(df.version.apply(pd.Series).major).size()

major
7    2
8    1
dtype: int64

df.groupby(df.version.apply(pd.Series).major)[['a', 'b']].sum()

Group pandas dataframe by a nested dictionary key

Answers (2)

Related Questions