andrey
andrey

Reputation: 39

R rolling sum with several windows, by groups

Given the following table:

library(data.table)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))

   value grp
1:     3   1
2:     1   1
3:     5   1
4:     6   2
5:     2   2
6:     5   3
7:    12   3
8:     6   3

I want to add 3 new columns such that each of them is the rolling sum of column 'value', grouped by column 'grp'. This is the config table which contains the windows length and name for each of these new columns:

rolling_conf <- data.table(name=c("2d", "4d", "7d"), window = c(1,2,2))

   name window
1:   2d      1
2:   4d      2
3:   7d      2

I was able to implement this task using for loop:

library(RcppRoll)
for(i in 1:nrow(rolling_conf)){
  df[ , rolling_conf$name[i] := roll_sumr(value, rolling_conf$window[i], na.rm=T), grp]
}

This is the output I get (this is the desirable output):

   value grp 2d 4d 7d
1:     3   1  3 NA NA
2:     1   1  1  4  4
3:     5   1  5  6  6
4:     6   2  6 NA NA
5:     2   2  2  8  8
6:     5   3  5 NA NA
7:    12   3 12 17 17
8:     6   3  6 18 18

I'm looking for a faster way of implementation (make it run in parallel and not in sequence). I don't want to use foreach. I guess the apply family is the way to go but I did not manage to write such a piece of code.

Your help is appreciated!

Upvotes: 1

Views: 538

Answers (2)

see-king_of_knowledge
see-king_of_knowledge

Reputation: 523

Here's a solution where I use lapply:

library(data.table)
library(RcppRoll)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))
rolling_conf <- list("2d" = 1, "4d"= 2, "7d" = 2)
dff <- split(df$value, df$grp)

dfl <- lapply(dff, function(y) sapply(rolling_conf, function(x) roll_sumr(y, x, na.rm=T)))


dfl <- do.call(rbind, dfl)
dfl
#      2d 4d 7d
# [1,]  3 NA NA
# [2,]  1  4  4
# [3,]  5  6  6
# [4,]  6 NA NA
# [5,]  2  8  8
# [6,]  5 NA NA
# [7,] 12 17 17
# [8,]  6 18 18


cbind(df,dfl)
#    value grp 2d 4d 7d
# 1:     3   1  3 NA NA
# 2:     1   1  1  4  4
# 3:     5   1  5  6  6
# 4:     6   2  6 NA NA
# 5:     2   2  2  8  8
# 6:     5   3  5 NA NA
# 7:    12   3 12 17 17
# 8:     6   3  6 18 18

Upvotes: 1

DanY
DanY

Reputation: 6073

One version that uses sapply() to avoid a manual loop:

library(data.table)
library(RcppRoll)

# create datasets
dt <- data.table(value=c(3,1,5,6,2,5,12,6), grp=c(1,1,1,2,2,3,3,3))
rc <- data.table(name=c("2d", "4d", "7d"), window=c(1,2,2))

# implement rolling sum according various window lengths
result <- sapply(as.list(rc$window), function(x) dt[ , roll_sumr(value, x, na.rm=T), by=grp][[2]])

# add back to dataset with correct column names
colnames(result) <- rc$name
cbind(dt, result)

Upvotes: 1

Related Questions