Reputation: 6773
I've been performing a groupby operation on a dataframe I have that aggregates columns together based on the column 'Name':
Name | As | Bs | Cs | Note
Mark 3 4 7 Good
Luke 2 1 12 Well
Mark 5 6 8 Ok
John 1 18 3 Great
So in this case, the rows with 'Mark' are aggregated together on columns A,B, and C using this code:
temp_df = temp_df.groupby(['Name'], as_index=False).agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})
A thing I need to add in is to do a count on the number of rows that have the same value in 'Name'. This would give me an output like:
Name | As | Bs | Cs | Note | Count
Mark 8 10 15 Good 2
Luke 2 1 12 Well 1
John 1 18 3 Great 1
How do I modify the above line of code to do what I need?
Upvotes: 4
Views: 13419
Reputation: 604
With modern pandas:
temp_df.groupby(['Name'], as_index=False).agg(
{
'As': np.sum,
'Bs': np.sum,
'Cs': np.sum,
'count':'size'
}
)
Upvotes: 0
Reputation: 863166
Use first
+ size
and then is necessary rename
columns by dict:
temp_df = temp_df.groupby('Name', sort=False) \
.agg({'As':np.sum,'Bs':np.sum,'Cs':np.sum,'Note':'first','Name':'size'}) \
.rename(columns={'Name':'Count'}) \
.reset_index() \
.reindex_axis(temp_df.columns.tolist() + ['Count'], axis=1)
print (temp_df)
Name As Bs Cs Note Count
0 Mark 8 10 15 Good 2
1 Luke 2 1 12 Well 1
2 John 1 18 3 Great 1
Dont use count
, only size
or len
.
What is the difference between size and count in pandas?
Upvotes: 3
Reputation: 2173
Create the group and do your aggregations:
the_group = temp_df.groupby(['Name'], as_index=False)
temp_df = the_group.agg({'As': np.sum, 'Bs': np.sum,'Cs': np.sum})
then compute the size
from the_group
temp_df['count'] = the_group.count()['Note']
gives:
Name Cs As Bs count
0 John 3 1 18 1
1 Luke 12 2 1 1
2 Mark 15 8 10 2
As suggested in the comments, it is safer to use size()
in case the data include NaN
:
temp_df['count'] = the_group.size().reset_index()[0]
Upvotes: 5