Reputation: 43
I have a pandas dataframe where one of the columns is dictionary type. This is an example dataframe:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3],
'b': [4,5,6],
'version': [{'major': 7, 'minor':1},
{'major':8, 'minor': 5},
{'major':7, 'minor':2}] })
df:
a b version
0 1 4 {'minor': 1, 'major': 7}
1 2 5 {'minor': 5, 'major': 8}
2 3 6 {'minor': 2, 'major': 7}
I am looking for a way to group the dataframe by one of that dictionary key; in this case to group the df dataframe by the major key in version label.
I have tried a few different stuff, from passing the dictionary key to dataframe groupby function, `df.groupby(['version']['major']), which doesn't work since major is not part of dataframe label, to assigning version to the dataframe index, but nothing works so far. I'm also trying to flatten the dictionaries as additional columns in the dataframe itself, but this seems to have its own issue.
Any idea?
P.S. Sorry about formatting, it's my first stackoverflow question.
Upvotes: 4
Views: 1256
Reputation: 294258
Option 1
df.groupby(df.version.apply(lambda x: x['major'])).size()
version
7 2
8 1
dtype: int64
df.groupby(df.version.apply(lambda x: x['major']))[['a', 'b']].sum()
Option 2
df.groupby(df.version.apply(pd.Series).major).size()
major
7 2
8 1
dtype: int64
df.groupby(df.version.apply(pd.Series).major)[['a', 'b']].sum()
Upvotes: 4
Reputation: 210842
you can do it this way:
In [15]: df.assign(major=df.version.apply(pd.Series).major).groupby('major').sum()
Out[15]:
a b
major
7 4 10
8 2 5
Upvotes: 2