anonymitySet
anonymitySet

Reputation: 23

Rolling covariance (or other function) between two columns of a dataset, by group

So, I have a data.table like this, in long format

#sample data
sample_size = 10
DT0 <- data.table(
  YEAR = seq(2021, by=-1, length.out = sample_size),
  a1 = seq(5, by=0.035, length.out = sample_size),
  a2 = seq(12, by=0.6, length.out = sample_size),
  a3 = seq(10, by=0.01, length.out = sample_size)
)

#melting to long size
DT <- melt(DT0, 
           id.vars = c("YEAR"), 
           variable.name = "ITEM",
           value.name = "VARIATION")

setkeyv(DT, cols=c("ITEM", "YEAR"))
> print(DT, 100)
     YEAR   ITEM VARIATION
    <num> <fctr>     <num>
 1:  2021     a1     5.000
 2:  2020     a1     5.035
 3:  2019     a1     5.070
 4:  2018     a1     5.105
 5:  2017     a1     5.140
 6:  2016     a1     5.175
 7:  2015     a1     5.210
 8:  2014     a1     5.245
 9:  2013     a1     5.280
10:  2012     a1     5.315
11:  2021     a2    12.000
12:  2020     a2    12.600
13:  2019     a2    13.200
14:  2018     a2    13.800
15:  2017     a2    14.400
16:  2016     a2    15.000
17:  2015     a2    15.600
18:  2014     a2    16.200
19:  2013     a2    16.800
20:  2012     a2    17.400
21:  2021     a3    10.000
22:  2020     a3    10.010
23:  2019     a3    10.020
24:  2018     a3    10.030
25:  2017     a3    10.040
26:  2016     a3    10.050
27:  2015     a3    10.060
28:  2014     a3    10.070
29:  2013     a3    10.080
30:  2012     a3    10.090
     YEAR   ITEM VARIATION

And I'm trying to compute news columns with the rolling (let's assume n = 5) covariance, pair-wise for each element of ITEM (a1, a2, a3)

I tried to manually create each pair, performing a joint DT[DT[ITEM==(a1,a2,a3)] and using cov(a,b) with the rolling function data.table::frollapply, something like this below:

#joint
DT2 <- DT[DT[ITEM == "a1"], on=.(YEAR)]
> print(DT2, 100)
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
    <num> <fctr>     <num> <fctr>       <num>
 1:  2012     a1     5.315     a1       5.315
 2:  2012     a2    17.400     a1       5.315
 3:  2012     a3    10.090     a1       5.315
 4:  2013     a1     5.280     a1       5.280
 5:  2013     a2    16.800     a1       5.280
 6:  2013     a3    10.080     a1       5.280
 7:  2014     a1     5.245     a1       5.245
 8:  2014     a2    16.200     a1       5.245
 9:  2014     a3    10.070     a1       5.245
10:  2015     a1     5.210     a1       5.210
11:  2015     a2    15.600     a1       5.210
12:  2015     a3    10.060     a1       5.210
13:  2016     a1     5.175     a1       5.175
14:  2016     a2    15.000     a1       5.175
15:  2016     a3    10.050     a1       5.175
16:  2017     a1     5.140     a1       5.140
17:  2017     a2    14.400     a1       5.140
18:  2017     a3    10.040     a1       5.140
19:  2018     a1     5.105     a1       5.105
20:  2018     a2    13.800     a1       5.105
21:  2018     a3    10.030     a1       5.105
22:  2019     a1     5.070     a1       5.070
23:  2019     a2    13.200     a1       5.070
24:  2019     a3    10.020     a1       5.070
25:  2020     a1     5.035     a1       5.035
26:  2020     a2    12.600     a1       5.035
27:  2020     a3    10.010     a1       5.035
28:  2021     a1     5.000     a1       5.000
29:  2021     a2    12.000     a1       5.000
30:  2021     a3    10.000     a1       5.000
     YEAR   ITEM VARIATION i.ITEM i.VARIATION
#computing cov pairs for "a1": cov(a1, a1); cov(a2, a1) and cov(a3, a1)..

DT2[, 
    "Cov(ITEM, a1)" := frollapply(.SD, n=5, FUN=cov(x= VARIATION, y= i.VARIATION)),
    by=.(ITEM)]

But I get this result:

>Error in match.fun(FUN) : 
  'cov(x = VARIATION, y = i.VARIATION)' is not a function, character or symbol

EDIT: Tried the @IRTFM sugestion, by doing:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = VARIATION, y = i.VARIATION)}),    by = .(ITEM)]

and got this error:

Error in frollapply(.SD, n = 5, FUN = function(x, y) { :   x must be list, data.frame or data.table of numeric or logical types

The elements of VARIATION and i.VARIATION are all numeric, so, I tried to return them as list by doing:

DT2[ , cov_1_x := frollapply(.SD, n = 5, FUN = function(x,y) {cov(x = .(VARIATION), y = .(i.VARIATION))}),    by = .(ITEM)]

But the same error is returned.

Do you have any tips or suggestions on how to correctly do this using frollapply?

Upvotes: 0

Views: 218

Answers (1)

mustafa korucu
mustafa korucu

Reputation: 1

I think something like this might work hope it helps

library(data.table)
 
    sample_size <- 10
    DT0 <- data.table(
      YEAR = seq(2021, by = -1, length.out = sample_size),
      a1 = seq(5, by = 0.035, length.out = sample_size),
      a2 = seq(12, by = 0.6, length.out = sample_size),
      a3 = seq(10, by = 0.01, length.out = sample_size)
    )
    
    DT <- melt(DT0, id.vars = "YEAR", variable.name = "ITEM", value.name = "VARIATION")
    setkey(DT, ITEM, YEAR)
  
    items <- unique(DT$ITEM)
    pairs <- CJ(item1 = items, item2 = items)
 
    cov_answer <- pairs[, {
      DT1 <- DT[ITEM == item1]
      DT2 <- DT[ITEM == item2]
      DT_merged <- DT1[DT2, on = "YEAR", nomatch = 0][, .(YEAR, item1, item2, VARIATION, i.VARIATION)]
    
    
      DT_merged[, cov := frollapply(.SD, n = 5, FUN = function(x) cov(x[, 1], x[, 2]), na.rm = TRUE), .SDcols = c("VARIATION", "i.VARIATION")]
    }, by = .(item1, item2)]

Upvotes: 0

Related Questions