RexIncognito
RexIncognito

Reputation: 43

Group pandas dataframe by a nested dictionary key

I have a pandas dataframe where one of the columns is dictionary type. This is an example dataframe:

import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 
                   'b': [4,5,6], 
                   'version': [{'major': 7, 'minor':1}, 
                               {'major':8, 'minor': 5},
                               {'major':7, 'minor':2}] })

df:

   a  b                   version
0  1  4  {'minor': 1, 'major': 7}
1  2  5  {'minor': 5, 'major': 8}
2  3  6  {'minor': 2, 'major': 7}

I am looking for a way to group the dataframe by one of that dictionary key; in this case to group the df dataframe by the major key in version label.

I have tried a few different stuff, from passing the dictionary key to dataframe groupby function, `df.groupby(['version']['major']), which doesn't work since major is not part of dataframe label, to assigning version to the dataframe index, but nothing works so far. I'm also trying to flatten the dictionaries as additional columns in the dataframe itself, but this seems to have its own issue.

Any idea?

P.S. Sorry about formatting, it's my first stackoverflow question.

Upvotes: 4

Views: 1256

Answers (2)

piRSquared
piRSquared

Reputation: 294258

Option 1

df.groupby(df.version.apply(lambda x: x['major'])).size()

version
7    2
8    1
dtype: int64

df.groupby(df.version.apply(lambda x: x['major']))[['a', 'b']].sum()

enter image description here

Option 2

df.groupby(df.version.apply(pd.Series).major).size()

major
7    2
8    1
dtype: int64

df.groupby(df.version.apply(pd.Series).major)[['a', 'b']].sum()

enter image description here

Upvotes: 4

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

you can do it this way:

In [15]: df.assign(major=df.version.apply(pd.Series).major).groupby('major').sum()
Out[15]:
       a   b
major
7      4  10
8      2   5

Upvotes: 2

Related Questions