Dirk Buttke
Dirk Buttke

Reputation: 355

How to calculate mean value of subset of rows for different groups?

Assuming the following data frame:

set.seed(2409)

df <- data.frame(group = rep(1:4, each=5), value = round(runif(20, 1, 10),0))

df

     group value
1        1     4
2        1     9
3        1     7
4        1     1
5        1     6
6        2     5
7        2     8
8        2     5
9        2     5
10       2     3
11       3     6
12       3     1
13       3     4
14       3     4
15       3     9
16       4     6
17       4     5
18       4     7
19       4     7
20       4     4

I'm now interested in calculating the mean of the value column based on the first three (or n) rows for each group.

So, what I want to achieve is:

     group value       mean
1        1     4   6.666667 
2        1     9   6.666667 
3        1     7   6.666667 
4        1     1   6.666667 
5        1     6   6.666667 
6        2     5   6.000000 
7        2     8   6.000000 
8        2     5   6.000000 
9        2     5   6.000000 
10       2     3   6.000000 
11       3     6   3.666667 
12       3     1   3.666667 
13       3     4   3.666667 
14       3     4   3.666667 
15       3     9   3.666667 
16       4     6   6.000000 
17       4     5   6.000000 
18       4     7   6.000000 
19       4     7   6.000000 
20       4     4   6.000000 

I can get the values in the mean column e.g. by running:

sapply(split(df, df$group), 
                    function(x) mean(x[1:3,]$value))

       1        2        3        4 
6.666667 6.000000 3.666667 6.000000 

But I am pretty sure that there has to be a more elegant way to get these values maybe by using dplyr. It's easy to calculate the overall mean for each group:

df <- df %>% 
  group_by(group) %>% 
  mutate(mean = mean(value))

df  

   group value  mean
   <int> <dbl> <dbl>
 1     1     4   5.4
 2     1     9   5.4
 3     1     7   5.4
 4     1     1   5.4
 5     1     6   5.4
 6     2     5   5.2
 7     2     8   5.2
 8     2     5   5.2
 9     2     5   5.2
10     2     3   5.2
11     3     6   4.8
12     3     1   4.8
13     3     4   4.8
14     3     4   4.8
15     3     9   4.8
16     4     6   5.8
17     4     5   5.8
18     4     7   5.8
19     4     7   5.8
20     4     4   5.8

But how do I consider only the first 3 rows here?

Thank you very much for your help!

Upvotes: 0

Views: 622

Answers (1)

r2evans
r2evans

Reputation: 160407

If you need to do it repeatedly (programmatically), you can do

means <- c(2,3,5)
df %>%
  group_by(group) %>%
  mutate(as.data.frame(lapply(setNames(means, paste0("mean", means)), 
                              function(z) mean(head(value,z))))) %>%
  ungroup()
# # A tibble: 20 x 5
#    group value mean2 mean3 mean5
#    <int> <dbl> <dbl> <dbl> <dbl>
#  1     1     4   6.5  6.67   5.4
#  2     1     9   6.5  6.67   5.4
#  3     1     7   6.5  6.67   5.4
#  4     1     1   6.5  6.67   5.4
#  5     1     6   6.5  6.67   5.4
#  6     2     5   6.5  6      5.2
#  7     2     8   6.5  6      5.2
#  8     2     5   6.5  6      5.2
#  9     2     5   6.5  6      5.2
# 10     2     3   6.5  6      5.2
# 11     3     6   3.5  3.67   4.8
# 12     3     1   3.5  3.67   4.8
# 13     3     4   3.5  3.67   4.8
# 14     3     4   3.5  3.67   4.8
# 15     3     9   3.5  3.67   4.8
# 16     4     6   5.5  6      5.8
# 17     4     5   5.5  6      5.8
# 18     4     7   5.5  6      5.8
# 19     4     7   5.5  6      5.8
# 20     4     4   5.5  6      5.8

Upvotes: 1

Related Questions