Reputation: 105
I have a huge data.table dt (almost 1.5 million rows) let say i want to apply a user defined function growth.ls to its rows, where scols (some columns in dt) are the arguments as
growth.ls <- function(values){
if (any(!is.finite(values)) || any(values <= 0)) return(NA_real_)
exp(lm(log(values) ~ (seq_along(values)))$coefficients[[2]] - 1) * 100}
dt[, `:=`(var = growth.ls(as.numeric(.SD))), .SDcols = scols, by = 1:nrow(dt)]
this process takes a very long time, I do not know if the problem is the growth.ls, or i am because i am using by: 1:nrow(dt).
Upvotes: 1
Views: 690
Reputation: 21749
What about this (using multicores with data.table):
library(parallel)
cl = makeCluster(detectCores())
choose_cols = startsWith(colnames(df),'x')
df[,growth := unlist(parApply(cl, .SD, 1, growth.ls), .SDcols = choose_cols]
Upvotes: 1