GreenGodot
GreenGodot

Reputation: 6773

Getting the size of a groupby operation in Pandas

I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':

Name | As | Bs | Cs   |  Note
Mark   3     4     7     Good
Luke   2     1     12    Well
Mark   5     6     8     Ok
John   1     18    3     Great

So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:

temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:

Name | As | Bs | Cs   |  Note   | Count
Mark   8     10    15    Good      2
Luke   2     1     12    Well      1
John   1     18    3     Great     1

How do I modify the above line of code to do what I need?

Upvotes: 4

Views: 13419

Answers (3)

Charles Plager
Charles Plager

Reputation: 604

With modern pandas:

temp_df.groupby(['Name'], as_index=False).agg(
  {
     'As': np.sum, 
     'Bs': np.sum,
     'Cs': np.sum, 
     'count':'size'
  }
)

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Use first + size and then is necessary rename columns by dict:

temp_df = temp_df.groupby('Name', sort=False) \
                .agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
                .rename(columns={'Name':'Count'}) \
                .reset_index() \
                .reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1) 
print (temp_df)
   Name  As  Bs  Cs   Note  Count
0  Mark   8  10  15   Good      2
1  Luke   2   1  12   Well      1
2  John   1  18   3  Great      1

Dont use count, only size or len.

What is the difference between size and count in pandas?

Upvotes: 3

VinceP
VinceP

Reputation: 2173

Create the group and do your aggregations:

the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})

then compute the size from the_group

temp_df['count'] = the_group.count()['Note']

gives:

   Name  Cs  As  Bs  count
0  John   3   1  18      1
1  Luke  12   2   1      1
2  Mark  15   8  10      2

Edit:

As suggested in the comments, it is safer to use size() in case the data include NaN:

temp_df['count'] = the_group.size().reset_index()[0] 

Upvotes: 5

Related Questions