Reputation: 15
I want to get a product of several columns with moving window. Data I use is modified timeseries data.
data <- data.frame(P011020 = 1+sample(100, size = 100, replace = TRUE)/100,P021020 = 1+sample(100, size = 100, replace = TRUE)/100,P031020 = 1+sample(100, size = 100, replace = TRUE)/100,P041020 = 1+sample(100, size = 100, replace = TRUE)/100,P051020 = 1+sample(100, size = 100, replace = TRUE)/100,P061020 = 1+sample(100, size = 100, replace = TRUE)/100,P071020 = 1+sample(100, size = 100, replace = TRUE)/100,P081020 = 1+sample(100, size = 100, replace = TRUE)/100,P091020 = 1+sample(100, size = 100, replace = TRUE)/100,P101020 = 1+sample(100, size = 100, replace = TRUE)/100)
And I want to get tibble where column is equal to product of itself on last seven. In our example it would be P071020=rowprod(P011020:P071020)
, P081020=rowprod(P021020:P081020)
and etc
I've tried using dplyr's rowwise()
and combine it with mutate_at()
, but I don't quite get how to make a moving window for each column.
Upvotes: 0
Views: 149
Reputation: 1065
I think you might be looking for mutate(across(...))
but I may be misunderstanding what you are looking for.
This is still not really ideal... {dplyr} does not really support a column index per se. I feel like this could be easier to implement with base R, without knowing more about your use case.
EDIT: here is a semi-dplyr approach, given the updates you made to OP. I am not 100% sure that this is what you were expecting for output, and given that your sample data are randomly generated there is nothing to compare to for accuracy. I suggested using across()
here, and that essentially is the "verb" you want but I don't think there is a convenient way to reference a variable number of preceding columns, so I use a mix of base R and dplyr below.
dat <- data.frame(P011020 = 1+sample(100, size = 100, replace = TRUE)/100,P021020 = 1+sample(100, size = 100, replace = TRUE)/100,P031020 = 1+sample(100, size = 100, replace = TRUE)/100,P041020 = 1+sample(100, size = 100, replace = TRUE)/100,P051020 = 1+sample(100, size = 100, replace = TRUE)/100,P061020 = 1+sample(100, size = 100, replace = TRUE)/100,P071020 = 1+sample(100, size = 100, replace = TRUE)/100,P081020 = 1+sample(100, size = 100, replace = TRUE)/100,P091020 = 1+sample(100, size = 100, replace = TRUE)/100,P101020 = 1+sample(100, size = 100, replace = TRUE)/100)
## dplyr-ish
library(dplyr, warn.conflicts = FALSE)
# use base R to iterate over numeric column index and re-combine w/ cbind
res <- do.call('cbind', lapply(7:10, function(i) {
rowwise(dat) %>%
select(all_of((i - 6):i)) %>%
apply(1, prod)
}))
dat[,7:10] <- res
head(dat)
#> P011020 P021020 P031020 P041020 P051020 P061020 P071020 P081020 P091020
#> 1 1.03 1.84 1.64 1.14 1.10 1.21 7.640061 9.494445 7.378835
#> 2 1.55 1.86 1.48 1.36 1.18 1.94 15.409445 10.339240 6.003430
#> 3 1.92 1.86 1.41 1.05 1.98 1.33 23.251763 19.134263 14.196389
#> 4 1.97 1.48 1.31 1.11 1.59 1.85 21.699030 16.522104 15.405746
#> 5 1.17 1.43 1.30 1.91 1.71 1.63 21.190120 20.465672 14.741008
#> 6 1.73 1.64 1.91 1.43 1.85 1.45 26.192058 21.498684 24.513743
#> P101020
#> 1 5.309161
#> 2 6.855268
#> 3 11.981349
#> 4 12.583319
#> 5 21.998120
#> 6 22.845268
Created on 2020-11-23 by the reprex package (v0.3.0)
Upvotes: 1