mitoRibo
mitoRibo

Reputation: 4548

Pandas count size of groupby groups idiomatically

I often want a dataframe of counts for how many members are in each group after a groupby operation in pandas. I have a verbose way of doing it with size and reset index and rename, but I'm sure there is a better way.

Here's an example of what I'd like to do:

import pandas as pd
import numpy as np

np.random.seed(0)
colors = ['red','green','blue']

cdf = pd.DataFrame({
    'color1':np.random.choice(colors,10),
    'color2':np.random.choice(colors,10),
})
print(cdf)

#better way to do next line? (somehow use agg?)
gb_count = cdf.groupby(['color1','color2']).size().reset_index().rename(columns={0:'num'})
print(gb_count)

#cdf.groupby(['color1','color2']).count() #<-- this doesn't work

Final output:

    color1  color2  num
0   blue    green   1
1   blue    red 1
2   green   blue    3
3   red green   4
4   red red 1

Upvotes: 0

Views: 64

Answers (2)

wwnde
wwnde

Reputation: 26676

Another way is to reset the grouper_index after sending it to_frame(with preferred column name) in an agg operation.

gb_count = cdf.groupby(['color1','color2']).agg('size').to_frame('num').reset_index()



  color1 color2  num
0   blue  green    1
1   blue    red    1
2  green   blue    3
3    red  green    4
4    red    red    1

Upvotes: 1

Erfan
Erfan

Reputation: 42916

To avoid getting your MultiIndex, use as_index=False:

cdf.groupby(['color1','color2'], as_index=False).size()

  color1 color2  size
0   blue  green     1
1   blue    red     1
2  green   blue     3
3    red  green     4
4    red    red     1

If you explicitly want to name your new column num. You can use reset_index with name=.. since groupby will return a series:

cdf.groupby(['color1','color2']).size().reset_index(name='num')

  color1 color2  num
0   blue  green    1
1   blue    red    1
2  green   blue    3
3    red  green    4
4    red    red    1

Upvotes: 2

Related Questions