Reputation: 4548
I often want a dataframe of counts for how many members are in each group after a groupby operation in pandas. I have a verbose way of doing it with size and reset index and rename, but I'm sure there is a better way.
Here's an example of what I'd like to do:
import pandas as pd
import numpy as np
np.random.seed(0)
colors = ['red','green','blue']
cdf = pd.DataFrame({
'color1':np.random.choice(colors,10),
'color2':np.random.choice(colors,10),
})
print(cdf)
#better way to do next line? (somehow use agg?)
gb_count = cdf.groupby(['color1','color2']).size().reset_index().rename(columns={0:'num'})
print(gb_count)
#cdf.groupby(['color1','color2']).count() #<-- this doesn't work
Final output:
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
Upvotes: 0
Views: 64
Reputation: 26676
Another way is to reset the grouper_index
after sending it to_frame(with preferred column name)
in an agg
operation.
gb_count = cdf.groupby(['color1','color2']).agg('size').to_frame('num').reset_index()
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
Upvotes: 1
Reputation: 42916
To avoid getting your MultiIndex, use as_index=False
:
cdf.groupby(['color1','color2'], as_index=False).size()
color1 color2 size
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
If you explicitly want to name your new column num
. You can use reset_index
with name=..
since groupby
will return a series:
cdf.groupby(['color1','color2']).size().reset_index(name='num')
color1 color2 num
0 blue green 1
1 blue red 1
2 green blue 3
3 red green 4
4 red red 1
Upvotes: 2