Reputation: 13
I am trying to get a rolling average of every third (or second) row in a specific column. More precisely I want to have the rolling average stored in a new variable in my existing dataframe.
I tried to use this function:
w %>%
mutate(rmean = roll_meanl(SWI_001, 3)) %>%
filter(between(date, 2, 4)) %>%
select(-date)
I want the rolling average of every third row of variable SWI_001. The new variable with the data should be roll_mean1.
But I am not sure how to filter it properly.
My dataframe (w) looks like this, however has 255 rows in total and almost 50 columns:
Date ASS_SUB ASS_TOP SWI_001 SWI_005 SWI_010 SWI_100 B1_005 B1_015 B1_025 B1_035 B1_045 B1_055 B1_065
1: 06.11.2018 17.37 14.16 15.54 NA 15.57 NA NA NA NA NA NA NA NA
2: 07.11.2018 17.32 13.90 15.21 NA 15.51 NA 14.4 14.8 14.8 14.8 14.8 14.8 14.8
3: 08.11.2018 17.29 13.77 14.75 NA 15.39 NA 14.2 14.8 14.8 14.8 14.8 14.8 14.8
4: 09.11.2018 17.27 13.64 15.05 NA 15.38 NA 14.0 14.8 14.8 14.8 14.8 14.8 14.8
5: 10.11.2018 17.24 13.64 14.89 NA 15.30 NA 13.9 14.8 14.8 14.8 14.8 14.8 14.8
---
In the end I need the rolling mean for more than just one column, but to get started I am happy for a nudge in the right direction.
Upvotes: 0
Views: 367
Reputation: 2509
Lots of different ways to do this:
zoo
package (rollmean
)MovingAverages
in TTR
ma
in forecast
See also here: Calculating moving average
For the next question, it would be great if you could include the data in the code you list using dput()
- I tried to recreate it quickly:
library(tidyverse)
tibble::tribble(
~Date, ~ASS_SUB, ~ASS_TOP, ~SWI_001, ~SWI_005, ~SWI_010, ~SWI_100, ~B1_005, ~B1_015, ~B1_025, ~B1_035, ~B1_045, ~B1_055, ~B1_065,
"06.11.2018",17.37,14.16,15.54,NA,15.57,NA, NA, NA, NA, NA, NA, NA, NA,
"07.11.2018",17.32,13.90,15.21,NA,15.51,NA,14.4,14.8,14.8,14.8,14.8,14.8,14.8,
"08.11.2018",17.29,13.77,14.75,NA,15.39,NA,14.2,14.8,14.8,14.8,14.8,14.8,14.8,
"09.11.2018",17.27,13.64,15.05,NA,15.38,NA,14.0,14.8,14.8,14.8,14.8,14.8,14.8,
"10.11.2018",17.24,13.64,14.89,NA,15.30,NA,13.9,14.8,14.8,14.8,14.8,14.8,14.8
) -> w
For your example, I'd use rollmean
from the zoo
package. We use the fill = NA
option to indicate that where there are not 3 values (one before, one after) the value should just be NA
. This means you don't have to filter your data. You can modify this of course - see more info using ?na.fill
.
library(zoo)
w %>%
mutate(rmean = rollmean(SWI_001,3, fill = NA))
If you want to apply it to a large number of columns, you can use the across()
command from the dplyr
package:
w %>%
mutate(across(.cols = c(SWI_001, SWI_010), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean"))
or even for all numeric columns:
w %>%
mutate(across(.cols = where(is.numeric), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean"))
A bit more on the filtering:
I'm not sure the 1:
, 2:
etc are part of your Date
variable (case 1 below) or just the row numbers (case 2). In any case though, I'd recommend converting the column to a Date
format using either:
Case 1:
w %>%
mutate(Date = gsub("[0-9]: ", "", Date),
Date = as.Date(Date, format = "%d.%m.%Y"))
Case 2:
w %>%
mutate(Date = as.Date(Date, format = "%d.%m.%Y"))
Once you have this, you can easily filter using (and combining from above):
w %>%
mutate(Date = gsub("[0-9]: ", "", Date),
Date = as.Date(Date, format = "%d.%m.%Y")) %>%
mutate(across(.cols = c(SWI_001, SWI_010), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean")) %>%
filter(between(Date, as.Date("2018-11-07"),as.Date("2018-11-09")))
Upvotes: 1