tagahasi
tagahasi

Reputation: 15

Product of several columns with rolling window using dplyr

I want to get a product of several columns with moving window. Data I use is modified timeseries data.

data <- data.frame(P011020 = 1+sample(100, size = 100, replace = TRUE)/100,P021020 = 1+sample(100, size = 100, replace = TRUE)/100,P031020 = 1+sample(100, size = 100, replace = TRUE)/100,P041020 = 1+sample(100, size = 100, replace = TRUE)/100,P051020 = 1+sample(100, size = 100, replace = TRUE)/100,P061020 = 1+sample(100, size = 100, replace = TRUE)/100,P071020 = 1+sample(100, size = 100, replace = TRUE)/100,P081020 = 1+sample(100, size = 100, replace = TRUE)/100,P091020 = 1+sample(100, size = 100, replace = TRUE)/100,P101020 = 1+sample(100, size = 100, replace = TRUE)/100)

And I want to get tibble where column is equal to product of itself on last seven. In our example it would be P071020=rowprod(P011020:P071020), P081020=rowprod(P021020:P081020) and etc

I've tried using dplyr's rowwise() and combine it with mutate_at(), but I don't quite get how to make a moving window for each column. enter image description here

Upvotes: 0

Views: 149

Answers (1)

Andrew Brown
Andrew Brown

Reputation: 1065

I think you might be looking for mutate(across(...)) but I may be misunderstanding what you are looking for.

This is still not really ideal... {dplyr} does not really support a column index per se. I feel like this could be easier to implement with base R, without knowing more about your use case.

EDIT: here is a semi-dplyr approach, given the updates you made to OP. I am not 100% sure that this is what you were expecting for output, and given that your sample data are randomly generated there is nothing to compare to for accuracy. I suggested using across() here, and that essentially is the "verb" you want but I don't think there is a convenient way to reference a variable number of preceding columns, so I use a mix of base R and dplyr below.

dat <- data.frame(P011020 = 1+sample(100, size = 100, replace = TRUE)/100,P021020 = 1+sample(100, size = 100, replace = TRUE)/100,P031020 = 1+sample(100, size = 100, replace = TRUE)/100,P041020 = 1+sample(100, size = 100, replace = TRUE)/100,P051020 = 1+sample(100, size = 100, replace = TRUE)/100,P061020 = 1+sample(100, size = 100, replace = TRUE)/100,P071020 = 1+sample(100, size = 100, replace = TRUE)/100,P081020 = 1+sample(100, size = 100, replace = TRUE)/100,P091020 = 1+sample(100, size = 100, replace = TRUE)/100,P101020 = 1+sample(100, size = 100, replace = TRUE)/100)

## dplyr-ish
library(dplyr, warn.conflicts = FALSE)

# use base R to iterate over numeric column index and re-combine w/ cbind
res <- do.call('cbind', lapply(7:10, function(i) {
  rowwise(dat) %>% 
    select(all_of((i - 6):i)) %>% 
    apply(1, prod)
}))

dat[,7:10] <- res
head(dat)
#>   P011020 P021020 P031020 P041020 P051020 P061020   P071020   P081020   P091020
#> 1    1.03    1.84    1.64    1.14    1.10    1.21  7.640061  9.494445  7.378835
#> 2    1.55    1.86    1.48    1.36    1.18    1.94 15.409445 10.339240  6.003430
#> 3    1.92    1.86    1.41    1.05    1.98    1.33 23.251763 19.134263 14.196389
#> 4    1.97    1.48    1.31    1.11    1.59    1.85 21.699030 16.522104 15.405746
#> 5    1.17    1.43    1.30    1.91    1.71    1.63 21.190120 20.465672 14.741008
#> 6    1.73    1.64    1.91    1.43    1.85    1.45 26.192058 21.498684 24.513743
#>     P101020
#> 1  5.309161
#> 2  6.855268
#> 3 11.981349
#> 4 12.583319
#> 5 21.998120
#> 6 22.845268

Created on 2020-11-23 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions