Arnold Klein
Arnold Klein

Reputation: 3086

Applying function on particular rows with GroupBy

How compute mean() or other function on particular rows using GroupBy. Consider the following dataframe:

 In[239]: df.groupby(['id'])['summary']
Out[239]: 
                summary
id         
11                  2.0
11                  3.0
11                  3.0
11                  3.0
11                  3.0
11                  3.0
14                  NaN
14                  NaN
14                  NaN
14                  NaN
14                  NaN
14                  2.0
17                  NaN
17                  NaN
17                  NaN
17                  NaN
17                  5.0
17                  5.0
18                  4.0
18                  5.0
18                  4.0
18                  3.0
18                  3.0
18                  4.0
23                  2.0
23                  1.0
23                  2.0
23                  1.0
23                  3.0
23                  1.0
                ...
81                 10.0
81                  9.0
81                  8.0
81                  8.0
81                  9.0
81                  9.0
82                  0.0
82                  0.0
82                  0.0
82                  0.0
82                  0.0
82                  0.0
83                  1.0
83                  0.0
83                  1.0
83                  2.0
83                  2.0
83                  1.0
84                  2.0
84                  0.0
84                  0.0
84                  0.0
84                  1.0
84                  NaN
85                  5.0
85                  4.0
85                  4.0
85                  5.0
85                  5.0
85                  4.0
  1. How to compute mean() of only first three rows of each id?
  2. How to compute mean() of masked (index with some conditions) rows within each id ?

For example:

df.groupby(['id'])['summary'].mean()

will compute mean() of each group (defined by id), but it takes all rows.

Upvotes: 0

Views: 55

Answers (1)

Ted Petrou
Ted Petrou

Reputation: 61967

The following would get both the mean of the first three rows and the mean of some mask.

df.groupby('id')['summary'].agg([lambda x: x.iloc[:3].mean(), lambda x: x[mask].mean()])

Upvotes: 2

Related Questions