daves_ma
daves_ma

Reputation: 13

Creating a roll mean average of every third row in one column storing it in a new variable

I am trying to get a rolling average of every third (or second) row in a specific column. More precisely I want to have the rolling average stored in a new variable in my existing dataframe.

I tried to use this function:

w %>% 
  mutate(rmean = roll_meanl(SWI_001, 3)) %>% 
  filter(between(date, 2, 4)) %>% 
  select(-date)

I want the rolling average of every third row of variable SWI_001. The new variable with the data should be roll_mean1.

But I am not sure how to filter it properly.

My dataframe (w) looks like this, however has 255 rows in total and almost 50 columns:

           Date ASS_SUB ASS_TOP SWI_001 SWI_005 SWI_010 SWI_100 B1_005 B1_015 B1_025 B1_035 B1_045 B1_055 B1_065
  1: 06.11.2018   17.37   14.16   15.54      NA   15.57      NA     NA     NA     NA     NA     NA     NA     NA
  2: 07.11.2018   17.32   13.90   15.21      NA   15.51      NA   14.4   14.8   14.8   14.8   14.8   14.8   14.8
  3: 08.11.2018   17.29   13.77   14.75      NA   15.39      NA   14.2   14.8   14.8   14.8   14.8   14.8   14.8
  4: 09.11.2018   17.27   13.64   15.05      NA   15.38      NA   14.0   14.8   14.8   14.8   14.8   14.8   14.8
  5: 10.11.2018   17.24   13.64   14.89      NA   15.30      NA   13.9   14.8   14.8   14.8   14.8   14.8   14.8
 ---                                                                                                            

In the end I need the rolling mean for more than just one column, but to get started I am happy for a nudge in the right direction.

Upvotes: 0

Views: 367

Answers (1)

Moritz Schwarz
Moritz Schwarz

Reputation: 2509

Lots of different ways to do this:

  • You can use the functions for Rolling Means/Maximums/Medians in the zoo package (rollmean)
  • use the MovingAverages in TTR
  • use ma in forecast

See also here: Calculating moving average

For the next question, it would be great if you could include the data in the code you list using dput() - I tried to recreate it quickly:

library(tidyverse)
tibble::tribble(
  ~Date, ~ASS_SUB, ~ASS_TOP, ~SWI_001, ~SWI_005, ~SWI_010, ~SWI_100, ~B1_005, ~B1_015, ~B1_025, ~B1_035, ~B1_045, ~B1_055, ~B1_065,
  "06.11.2018",17.37,14.16,15.54,NA,15.57,NA,  NA,  NA, NA, NA, NA, NA, NA,
  "07.11.2018",17.32,13.90,15.21,NA,15.51,NA,14.4,14.8,14.8,14.8,14.8,14.8,14.8,
  "08.11.2018",17.29,13.77,14.75,NA,15.39,NA,14.2,14.8,14.8,14.8,14.8,14.8,14.8,
  "09.11.2018",17.27,13.64,15.05,NA,15.38,NA,14.0,14.8,14.8,14.8,14.8,14.8,14.8,
  "10.11.2018",17.24,13.64,14.89,NA,15.30,NA,13.9,14.8,14.8,14.8,14.8,14.8,14.8
) -> w

For your example, I'd use rollmean from the zoo package. We use the fill = NA option to indicate that where there are not 3 values (one before, one after) the value should just be NA. This means you don't have to filter your data. You can modify this of course - see more info using ?na.fill.

library(zoo)
w %>% 
  mutate(rmean = rollmean(SWI_001,3, fill = NA))

If you want to apply it to a large number of columns, you can use the across() command from the dplyr package:

w %>% 
  mutate(across(.cols = c(SWI_001, SWI_010), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean"))

or even for all numeric columns:

w %>% 
  mutate(across(.cols = where(is.numeric), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean"))

A bit more on the filtering:

I'm not sure the 1:, 2: etc are part of your Date variable (case 1 below) or just the row numbers (case 2). In any case though, I'd recommend converting the column to a Date format using either:

Case 1:

w %>% 
  mutate(Date = gsub("[0-9]: ", "", Date),
         Date = as.Date(Date, format = "%d.%m.%Y"))

Case 2:

w %>% 
  mutate(Date = as.Date(Date, format = "%d.%m.%Y"))

Once you have this, you can easily filter using (and combining from above):

w %>% 
  mutate(Date = gsub("[0-9]: ", "", Date),
         Date = as.Date(Date, format = "%d.%m.%Y")) %>%
  mutate(across(.cols = c(SWI_001, SWI_010), .fns = ~rollmean(., 3, fill = NA), .names = "{.col}_rmean")) %>% 
  filter(between(Date, as.Date("2018-11-07"),as.Date("2018-11-09")))

Upvotes: 1

Related Questions