How to aggregate without affecting certain columns in R?

Question

I am trying to condense my data by calculating the mean of every 15 rows in my data set, by doing this :

n<-15
aggregate(df[c("columnC", "ColumnD")],list(rep(1:(nrow(df)%/%n+1),each=n,len=nrow(df))),mean)[-1]

This works, but the problem is I have 2 other columns that are discrete values, and obviously I cannot take the mean of discrete values, and the code above cuts out the other columns and only has columnC and columnD. How can I do this so that for any of the discrete values, I just take the value of the 15th rows?

For example, if I have data like this :

1   Sunday   Evening             16.2  235.84
2   Sunday   Evening             23.4  235.29
3   Sunday   Evening             29.4  232.79
4   Sunday   Evening             24.2  233.89
5   Sunday   Evening             24.2  233.66
6   Sunday   Evening             24.2  233.38
7   Sunday   Evening             24.2  232.99
8   Sunday   Evening             25.4  233.21
9   Sunday   Evening             26.8  232.37
10  Sunday     Night             25.6  231.55
11  Sunday     Night             24.4  231.19
12  Sunday     Night             24.4  231.63
13  Sunday     Night             24.4  231.71
14  Sunday     Night             25.2  231.23
15  Sunday     Night             25.2  231.23

I would want to take the mean of the third and 4th column, and for the 1st and 2nd column I'd be happy with "Sunday" and "Night" because those are what the values are on the 15th row.

Ronak Shah · Accepted Answer

Just to simplify, for the example you shared I took n = 3 and used dplyr in the following way

library(dplyr)
n <- 3
df %>%
   group_by(group = rep(1:(nrow(df)%/%n+1),each=n,len=nrow(df))) %>%
   summarise(three_mean = mean(V3), 
             four_mean = mean(V4), 
             last_v1 = last(V1), 
             last_v2 = last(V2))


#  group three_mean four_mean last_v1 last_v2
#                   
#1     1       23.0       235 Sunday  Evening
#2     2       24.2       234 Sunday  Evening
#3     3       25.5       233 Sunday  Evening
#4     4       24.8       231 Sunday  Night  
#5     5       24.9       231 Sunday  Night

This returns mean of every 3 rows for column 3 and 4 and takes the last values for column 1 and 2.

For your real example, this should work if you change n to 15.

How to aggregate without affecting certain columns in R?

Answers (1)

Related Questions