user6452857
user6452857

Reputation: 117

R: Get multiple rolling means of multiple columns that is lagged

I want to get rolling means for the past 1 to 10 events grouped by a column for multiple columns. I also want it very fast such as in dplyr or data.table because I want to run this on a 1,000,000 x 1,000 dataframe.

starting df

data.table(a = c("bill", "bob", "bill", "bob", "bill", "bob"),
       b = c(1,2,1,1,3,2),
       c = c(2,3,9,1,4,1),
       d = c(4,5,1,7,3,4))

 1: bill 1 2 4
 2:  bob 2 3 5
 3: bill 1 9 1
 4:  bob 1 1 7
 5: bill 3 4 3
 6:  bob 2 1 4

desired df

I want the rolling mean of only b and c grouped by column a with a window of 1 to 10 for each column lagged 1 row.

     a  b c d b_roll1 c_roll1  b_roll2 c_roll2  b_roll3 c_roll3 
1: bill 1 2 4   NA    NA         NA        NA      NA     NA 
2:  bob 2 3 5   NA    NA         NA        NA      NA     NA  
3: bill 1 9 1   1     2           1        2       1       2   
4:  bob 1 1 7   2     3           2        3       2       3       
5: bill 3 4 3   1     9           1       5.5      1      5.5    
6:  bob 2 1 4   1     1           1        2       1       2              

Upvotes: 0

Views: 404

Answers (2)

Axeman
Axeman

Reputation: 35307

Your example outcome doesn't make too much sense to me, but here is an example on how you can generate many mutate calls programmatically.

An extendable solution using lazyeval and RcppRoll:

library(tidyverse)
vars <- c('b', 'c')
ns <- 1:10
com <- expand.grid(vars, ns, stringsAsFactors = FALSE)

dots <- map2(com[[1]], com[[2]],
             ~lazyeval::interp(~RcppRoll::roll_meanr(x, y, fill = NA), x = as.name(.x), y = .y))
names(dots) <- apply(com, 1, paste0, collapse = '_')

D %>%
  group_by(a) %>% 
  mutate_(.dots = dots)

Gives:

Source: local data frame [6 x 24]
Groups: a [2]

      a     b     c     d `b_ 1` `c_ 1` `b_ 2` `c_ 2`   `b_ 3`   `c_ 3` `b_ 4` `c_ 4` `b_ 5` `c_ 5` `b_ 6` `c_ 6` `b_ 7` `c_ 7` `b_ 8` `c_ 8` `b_ 9`
  <chr> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1  bill     1     2     4      1      2     NA     NA       NA       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
2   bob     2     3     5      2      3     NA     NA       NA       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
3  bill     1     2     1      1      2    1.0      2       NA       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
4   bob     1     1     7      1      1    1.5      2       NA       NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
5  bill     3     4     3      3      4    2.0      3 1.666667 2.666667     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
6   bob     2     1     4      2      1    1.5      1 1.666667 1.666667     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
# ... with 3 more variables: `c_ 9` <dbl>, b_10 <dbl>, c_10 <dbl>

Upvotes: 2

Edwin
Edwin

Reputation: 3242

I am still not completely following you. It seems that you apply a combination of a lag and a rolled mean. For just the rolled mean this is a solution using dplyr and RcppRoll.

roll_mean_na <- function(x, lag){
  c(rep(NA, lag - 1), RcppRoll::roll_mean(x, lag, align = "left"))
}

library(dplyr)
df %>% group_by(a) %>%  
   mutate(b_2 = roll_mean_na(b, 2), c_2 = roll_mean_na(c, 2),
          b_3 = roll_mean_na(b, 3), c_3 = roll_mean_na(c, 3),
          b_4 = roll_mean_na(b, 4), c_4 = roll_mean_na(c, 4))

Upvotes: 1

Related Questions