Will
Will

Reputation: 942

pairwise operations on data.table

I'm trying to calculate the correlations between pairs of columns, then fit n linear models on all columns to predict said correlations, then to predict correlations (and then use them to generate correlated random a, b, c, d, e but that's not really relevant).
I have done it. but now the code is really repetitive, and i wanted to know how to do it in a dynamic way to avoid repetitions in the code. I'm using data.table because performance is key as the table could be huge.
that is what I have at the moment:

data <-
  data.table(a = rnorm(20),
             b = rnorm(20),
             c = rnorm(20),
             d = rnorm(20),
             logret_a = rnorm(20),
             logret_b = rnorm(20),
             logret_c = rnorm(20),
             logret_d = rnorm(20))

corr <- function(y) cor(y[, 1], y[, 2])
corperiod <- 3

#1 here i would like one line instead of one line per ab, ac, ad, bc, bd
data$cor_ab <- c(NA, zoo::rollapplyr(data[-1, .(logret_a, logret_b)], corperiod, corr, by.column = FALSE, fill = NA))
data$cor_ac <- c(NA, zoo::rollapplyr(data[-1, .(logret_a, logret_c)], corperiod, corr, by.column = FALSE, fill = NA))
data$cor_ad <- c(NA, zoo::rollapplyr(data[-1, .(logret_a, logret_d)], corperiod, corr, by.column = FALSE, fill = NA))
# and so on...

#2 here i would like two lines instead of two lines per ab, ac, ad, bc, bd
fit_cor_ab <- lm(data=data, cor_ab ~ a + b + c + d + logret_a + logret_b + logret_c +  logret_d)
fit_cor_ab <- MASS::stepAIC(fit_cor_ab, direction="both", trace = FALSE)

fit_cor_ac <- lm(data=data, cor_ac ~ a + b + c + d + logret_a + logret_b + logret_c +  logret_d)
fit_cor_ac <- MASS::stepAIC(fit_cor_ac, direction="both", trace = FALSE)
# and so on...

simul <- as.data.table(matrix(0, nrow=100, ncol=ncol(data)))
colnames(simul) <- colnames(data)
simul[1] <- data[1]
# skipping for loop
i <- 2
#3 here i would like one line instead of one line per ab, ac, ad, bc, bd
simul[i, cor_ab := predict(fit_cor_ab, newdata = simul[i-1])]
simul[i, cor_ac := predict(fit_cor_ac, newdata = simul[i-1])]
# and so on...

what I would like then is to have a way to do these 3 operations as flagged in #1, #2, and #3 for any number of columns in the data.table data, and that would work whatever the name of the columns in data. my code works... just I have to amend it any time i change the content of data so it's not nice...

Any help appreciated!

Upvotes: 0

Views: 139

Answers (1)

Vincent
Vincent

Reputation: 17725

Seems like a good candidate for a simple function and a nested for loop. Did you try something like this? Notice that the calculate_cor function modifies data in place, so you don't need to return anything.

# correlations
calculate_cor <- function(x, y){
  corr <- function(y) cor(y[, 1], y[, 2])
  corperiod <- 3
  newcol <- paste0("cor_", x, y)
  x <- paste0("logret_", x)
  y <- paste0("logret_", y)
  tmp <- data[-1, .SD, .SDcols=c(x, y)]
  out <- zoo::rollapplyr(tmp, corperiod, corr, 
                         by.column=FALSE, fill = NA)
  data[, (newcol) := c(NA, out)]
}

cols <- c("a", "b", "c", "d")
for (x in cols) {
  for (y in cols) {
    if (x < y) {
      calculate_cor(x, y)
    }
  }
}

# regressions
fit <- list()
cols <- grep("cor_", colnames(data), value=TRUE)
for (x in cols) {
  f <- as.formula(paste(x, "~a+b+c+d+logret_a+logret_b+logret_c+logret_d"))
  mod <- lm(f, data)
  fit[[x]] <- MASS::stepAIC(mod, direction="both", trace=FALSE)
}

I don't quite understand what you are trying to accomplish with the 3rd step, but hopefully the same logic as above applies.

Upvotes: 1

Related Questions