Reputation: 3713
Given the following data frame:
import numpy as np
import pandas as pd
df = pd.DataFrame({'group':['s','s','s','p','p','p'],
'section':['a','b','b','a','a','b']
})
group section
0 s a
1 s b
2 s b
3 p a
4 p a
5 p b
I'd like a count of the number of sections per group and the maximum number of rows per section for each group. Like this:
group section count max min
s 2 2 1
p 2 2 1
Upvotes: 0
Views: 130
Reputation: 863166
IIUC you can use:
import pandas as pd
import numpy as np
df = pd.DataFrame({'group':['s','s','s','s','p','p','p','p','p'],
'section':['b','b','b','a','a','b','a','a','b']
})
print (df)
group section
0 s b
1 s b
2 s b
3 s a
4 p a
5 p b
6 p a
7 p a
8 p b
print (df.groupby(['group', 'section']).size() )
group section
p a 3
b 2
s a 1
b 3
dtype: int64
print (df.groupby(['group', 'section']).size().groupby(level=1).agg([len, min, max]) )
len min max
section
a 2 1 3
b 2 2 3
Or maybe you can change len
to nunique
:
print (df.groupby(['group', 'section']).size().groupby(level=1).agg(['nunique', min, max]) )
nunique min max
section
a 2 1 3
b 2 2 3
Or in need by first level of multiindex:
print (df.groupby(['group', 'section']).size().groupby(level=0).agg([len, min, max]) )
len min max
group
p 2 2 3
s 2 1 3
print (df.groupby(['group', 'section']).size().groupby(level=0).agg(['nunique', min, max]) )
nunique min max
group
p 2 2 3
s 2 1 3
Upvotes: 1
Reputation: 394159
You can achieve this by grouping on 'group' generate the value_counts and then grouping again:
In [91]:
df.groupby('group')['section'].apply(pd.Series.value_counts).groupby(level=1).agg(['nunique','max','min'])
Out[91]:
nunique max min
a 2 2 1
b 2 2 1
To get close to the desired result you can do this:
In [102]:
df.groupby('group')['section'].apply(pd.Series.value_counts).reset_index().drop('level_1', axis=1).groupby('group',as_index=False).agg(['nunique','max','min'])
Out[102]:
section
nunique max min
group
p 2 2 1
s 2 2 1
Upvotes: 1