Reputation: 625
I would like to calculate a rolling mean on data in a single data frame by multiple ids. See my example dataset below.
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)
df <- data.frame(date, index, x, y, z)
I would like to calculate the rolling mean for x, y and z, by a and then by b.
I tried the following, but I am getting an error.
test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))
The error:
Error in xu[k:n] - xu[c(1, seq_len(n - k))] :
non-numeric argument to binary operator
It seems like there is an issue with the fact that index is a character, but I need it in order to calculate the means...
Upvotes: 3
Views: 916
Reputation: 270268
1) ave Try ave
rather than tapply
and make sure it is applied only over the columns of interest, i.e. columns 3, 4, 5.
roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))
giving:
date index x y z
1 2015-02-01 a NA NA NA
2 2015-02-02 a NA NA NA
3 2015-02-03 a 66.50522 127.45650 129.8472
4 2015-02-04 a 61.71320 123.83633 129.7673
5 2015-02-05 a 56.56125 120.86158 126.1371
6 2015-02-06 a 66.13340 119.93428 127.1819
7 2015-02-07 a 59.56807 105.83208 125.1244
8 2015-02-08 a 49.98779 95.66024 139.2321
9 2015-02-09 a NA NA NA
10 2015-02-10 a NA NA NA
11 2015-02-01 b NA NA NA
12 2015-02-02 b NA NA NA
13 2015-02-03 b 55.71327 117.52219 139.3961
14 2015-02-04 b 54.58450 107.81763 142.6101
15 2015-02-05 b 50.48102 104.94084 136.3167
16 2015-02-06 b 37.89790 95.45489 135.4044
17 2015-02-07 b 33.05259 85.90916 150.8673
18 2015-02-08 b 49.91385 90.04940 147.1376
19 2015-02-09 b NA NA NA
20 2015-02-10 b NA NA NA
2) by Another way is to use by
. roll2
handles one group, by
applies it to each group producing a by
list and do.call("rbind", ...)
puts it back together.
roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))
giving:
date index x y z
a.1 2015-02-01 a NA NA NA
a.2 2015-02-02 a NA NA NA
a.3 2015-02-03 a 66.50522 127.45650 129.8472
a.4 2015-02-04 a 61.71320 123.83633 129.7673
a.5 2015-02-05 a 56.56125 120.86158 126.1371
a.6 2015-02-06 a 66.13340 119.93428 127.1819
a.7 2015-02-07 a 59.56807 105.83208 125.1244
a.8 2015-02-08 a 49.98779 95.66024 139.2321
a.9 2015-02-09 a NA NA NA
a.10 2015-02-10 a NA NA NA
b.11 2015-02-01 b NA NA NA
b.12 2015-02-02 b NA NA NA
b.13 2015-02-03 b 55.71327 117.52219 139.3961
b.14 2015-02-04 b 54.58450 107.81763 142.6101
b.15 2015-02-05 b 50.48102 104.94084 136.3167
b.16 2015-02-06 b 37.89790 95.45489 135.4044
b.17 2015-02-07 b 33.05259 85.90916 150.8673
b.18 2015-02-08 b 49.91385 90.04940 147.1376
b.19 2015-02-09 b NA NA NA
b.20 2015-02-10 b NA NA NA
3) wide form Another approach is to convert df
from long form to wide form in which case a plain rollmean
will do it.
rollmean(read.zoo(df, split = 2), 5, fill = NA)
giving:
x.a y.a z.a x.b y.b z.b
2015-02-01 NA NA NA NA NA NA
2015-02-02 NA NA NA NA NA NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790 95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259 85.90916 150.8673
2015-02-08 49.98779 95.66024 139.2321 49.91385 90.04940 147.1376
2015-02-09 NA NA NA NA NA NA
2015-02-10 NA NA NA NA NA NA
This works because the dates are the same for both groups. If the dates were different then it could introduce NAs and rollmean
cannot handle those. In that case use
rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)
Note: Since the input uses random numbers in its definition to make it reproducible we must issue set.seed
first. We used this:
set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)
Upvotes: 3
Reputation: 5689
This ought to do the trick using the library dplyr
and zoo
:
library(dplyr)
library(zoo)
df %>%
group_by(index) %>%
mutate(x_mean = rollmean(x, 5, fill = NA),
y_mean = rollmean(y, 5, fill = NA),
z_mean = rollmean(z, 5, fill = NA))
You could probably tidy this up more using mutate_each
or some other form of mutate
.
You can also change the arguments within rollmean
to fit your needs, such as align = "right"
or na.pad = TRUE
Upvotes: 2