Rolling by group in data.table R

Question

I'm trying to roll my function through data.table by group and run into problems. Not sure should I change my function or is my call wrong. Here is simple example:

Data

 test <- data.table(return=c(0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2),
                   sec=c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B"))

my function

zoo_fun <- function(dt, N) {
  (rollapply(dt$return + 1, N, FUN=prod, fill=NA, align='right') - 1)
}

Running it (I want to create new column momentum, which would be just product of latest 3 observations added by one for each security (so grouping by=sec).

test[, momentum3 := zoo_fun(test, 3), by=sec]

    Warning messages:
    1: In `[.data.table`(test, , `:=`(momentum3, zoo_fun(test, 3)), by = sec) :
      RHS 1 is length 10 (greater than the size (5) of group 1). The last 5 element(s) will be discarded.
    2: In `[.data.table`(test, , `:=`(momentum3, zoo_fun(test, 3)), by = sec) :
      RHS 1 is length 10 (greater than the size (5) of group 2). The last 5 element(s) will be discarded.

I get that warning and result is not expected:

> test
    return sec momentum3
 1:    0.1   A        NA
 2:    0.1   A        NA
 3:    0.1   A     0.331
 4:    0.1   A     0.331
 5:    0.1   A     0.331
 6:    0.2   B        NA
 7:    0.2   B        NA
 8:    0.2   B     0.331
 9:    0.2   B     0.331
10:    0.2   B     0.331

I was expecting B sec to be filled with 0.728 ((1.2*1.2*1.2) -1) with two NAs in start. What am I doing wrong? Is it that rolling functions won't work with grouping?

LyzandeR · Accepted Answer

When you use dt$return the whole data.table is picked internally within the groups. Just use the column you need in the function definition and it will work fine:

#use the column instead of the data.table
zoo_fun <- function(column, N) {
  (rollapply(column + 1, N, FUN=prod, fill=NA, align='right') - 1)
}

#now it works fine
test[, momentum := zoo_fun(return, 3), by = sec]

As a separate note, you should probably not use return as a column or variable name.

Out:

> test
    return sec momentum
 1:    0.1   A       NA
 2:    0.1   A       NA
 3:    0.1   A    0.331
 4:    0.1   A    0.331
 5:    0.1   A    0.331
 6:    0.2   B       NA
 7:    0.2   B       NA
 8:    0.2   B    0.728
 9:    0.2   B    0.728
10:    0.2   B    0.728

Rolling by group in data.table R

Answers (2)

Benchmark (10 rows, OP data set)

Benchmark (100k rows)

Related Questions