Dinosaurius
Dinosaurius

Reputation: 8628

How to count and sum entries per each group?

This is the dataframe:

GROUP   TIME  EVAL
AAA     20    0
AAA     22    0
AAA     21    1
AAA     20    0
BBB     20    0

I want to see how many entries belong to each grouping and how many entries have EVAL equal to 1 in each grouping. I have almost finished the code, but just not sure how to count entries per group. It seems to search for a column TOTAL_CALLS, while I want to create it.

final = df.groupby(['GROUP']).agg({'TIME':'mean','EVAL':'sum','TOTAL_NUM':'count'}).reset_index()

Upvotes: 1

Views: 286

Answers (3)

Venkatachalam
Venkatachalam

Reputation: 16966

Using the Time column itself, we can calculate both number of records and mean time for each group. This can be achieved by sending a list ['mean','count'] for the aggregation. we could find the sum of Eval for each group as well.

  print(data.groupby(['Group']).agg({'Time':['mean','count'],'Eval' : 'sum'}).reset_index())


  Group Eval   Time      
         sum   mean count
0   AAA    1  20.75     4
1   BBB    0  20.00     1

Upvotes: 1

b2002
b2002

Reputation: 914

df:
    EVAL GROUP  TIME
0  0     AAA   20  
1  0     AAA   22  
2  1     AAA   21  
3  0     AAA   20  
4  0     BBB   20     

# group by 'GROUP'
gb = df.groupby('GROUP')

# aggregate each column by method
final = gb.agg({'GROUP': 'count',
                'EVAL': 'sum',
                'TIME': 'mean'})

# rename columns and reset index
final = final.rename(columns={'GROUP': 'GROUP_TOTAL',
                              'EVAL': 'EVAL_COUNT',
                              'TIME': 'TIME_AVG'}).reset_index()

out:
  GROUP  EVAL_COUNT  TIME_AVG  GROUP_TOTAL
0  AAA   1          20.7500    4          
1  BBB   0          20.0000    1          

Upvotes: 1

jezrael
jezrael

Reputation: 862841

You can use lambda and sum boolean mask:

final = df.groupby(['GROUP']).agg({'TIME':['mean', 'size'],
                                  'EVAL': [lambda x: (x == 1).sum(), 'sum']})
final = final.rename(columns={'<lambda>':'size_of_1'})
final.columns = final.columns.map('_'.join)
final = final.reset_index()
print (final)
  GROUP  EVAL_size_of_1  EVAL_sum  TIME_mean  TIME_size
0   AAA               1         1      20.75          4
1   BBB               0         0      20.00          1

Upvotes: 1

Related Questions