pandas groupby, filter and aggregate function

Question

I have following dataframe in pandas

  key     time      outlier      
  1_2     4         False
  1_2     2         True
  1_2     2         True
  1_2     5         True
  1_2     6         False
  1_3     10        False
  1_3     12        False
  1_3     10        True
  1_3     20        True

I want to calculate mean, median and count at every key level where outlier is True. My desired dataframe is as follows

  key     time      outlier     outlier_avg      outlier_median     outlier_count    
  1_2     4         False       3                2                  3
  1_2     2         True        3                2                  3 
  1_2     2         True        3                2                  3
  1_2     5         True        3                2                  3
  1_2     6         False       3                2                  3
  1_3     10        False       15               15                 2
  1_3     12        False       15               15                 2
  1_3     10        True        15               15                 2
  1_3     20        True        15               15                 2

Mean, median and count will be duplicated across key which is fine. I am doing following thing in pandas

  data.groupby('key').filter(lambda x : x['outlier'] == True).agg({'time':['mean','median', 'count'])

jezrael · Accepted Answer

First filter by boolean indexing, then aggregate, rename columns names by DataFrame.add_prefix and last DataFrame.join to original:

df = (data.join(data[data['outlier']].groupby('key')['time'].agg(['mean','median', 'count'])
                                     .add_prefix('outlier_'), on='key'))
print (df)
   key  time  outlier  outlier_mean  outlier_median  outlier_count
0  1_2     4    False             3               2              3
1  1_2     2     True             3               2              3
2  1_2     2     True             3               2              3
3  1_2     5     True             3               2              3
4  1_2     6    False             3               2              3
5  1_3    10    False            15              15              2
6  1_3    12    False            15              15              2
7  1_3    10     True            15              15              2
8  1_3    20     True            15              15              2

EDIT: Solution for processing Trues and Falses columns separately:

df = data.groupby(['key','outlier'])['time'].agg(['mean','median', 'count']).unstack()
df.columns = df.columns.map(lambda x: f'outlier_{x[0]}_{x[1]}')

df = data.join(df, on='key')

print (df)
   key  time  outlier  outlier_mean_False  outlier_mean_True  \
0  1_2     4    False                   5                  3   
1  1_2     2     True                   5                  3   
2  1_2     2     True                   5                  3   
3  1_2     5     True                   5                  3   
4  1_2     6    False                   5                  3   
5  1_3    10    False                  11                 15   
6  1_3    12    False                  11                 15   
7  1_3    10     True                  11                 15   
8  1_3    20     True                  11                 15   

   outlier_median_False  outlier_median_True  outlier_count_False  \
0                     5                    2                    2   
1                     5                    2                    2   
2                     5                    2                    2   
3                     5                    2                    2   
4                     5                    2                    2   
5                    11                   15                    2   
6                    11                   15                    2   
7                    11                   15                    2   
8                    11                   15                    2   

   outlier_count_True  
0                   3  
1                   3  
2                   3  
3                   3  
4                   3  
5                   2  
6                   2  
7                   2

pandas groupby, filter and aggregate function

Answers (1)

Related Questions