Jason Dealey
Jason Dealey

Reputation: 310

R: Create new variables in data.table from a separate list of parameters

Create a dataset and the function I want to use

library(data.table)
DT <- data.table(V1=c(rep("A",5),rep("B",5)),
                 V2=rep(1:5,2),
                 V3=c(10,10,0,0,0,5,10,0,0,0),
                 V4=c(0,0,0,2,2,0,0,0,4,4))
testFunction<-function(x,transformation){
  l<-length(x)
  out <- rep(0, l)
  out[1] <- x[1]
  for (i in 2:l) {
    #out[i] <- x[i] + (1 - transformation) * x[i - 1] #EDIT: Function was wrong
    out[i] <- x[i] + (1 - transformation) * out[i - 1]
  }
  return(out)
}

Now What I want to do is create a new dataset, newDT, using the information from the below application data.frame

application<-data.frame(var=c("V3","V3","V4"),
                        transform=c(0.5,0.9,0.5))

The code I want to end up with from this function is as follows: Creating new variables using the variable names and transformations in the application, and doing this by column V1.

newDT<-DT[,':='(V3_0.5=testFunction(V3,0.5),
         V3_0.9=testFunction(V3,0.9),
         V4_0.5=testFunction(V4,0.5)),
   by="V1"]

It is simple enough to code this up as text using a couple of paste functions, and then passing this to eval(parse(text=....)):

application$code<-paste(application$var,"_",application$transform,"=testFunction(",application$var,",",application$transform,")",sep="")
code<-paste("newDT<-DT[,':='(",paste(application$code,collapse=","),"),by='V1']")
eval(parse(text=code))

however that runs into an issue when you pass over 4076 characters in the string ( (a) No idea why and (b) is not recommended all over the Runiverse).

The question: How do I go about this?

Happy to look at alternative solutions such as dplyr if speed isn't affected.

EDIT: The output table should look as following

     V1 V2 V3 V4  V3_0.5  V3_0.9 V4_0.5
 1:  A  1 10  0 10.0000 10.0000      0
 2:  A  2 10  0 15.0000 11.0000      0
 3:  A  3  0  0  7.5000  1.1000      0
 4:  A  4  0  2  3.7500  0.1100      2
 5:  A  5  0  2  1.8750  0.0110      3
 6:  B  1  5  0  5.0000  5.0000      0
 7:  B  2 10  0 12.5000 10.5000      0
 8:  B  3  0  0  6.2500  1.0500      0
 9:  B  4  0  4  3.1250  0.1050      4
10:  B  5  0  4  1.5625  0.0105      6

Upvotes: 0

Views: 2227

Answers (2)

Jason Dealey
Jason Dealey

Reputation: 310

Thanks to Chris for providing me with a step in the right direction, with a solution that will work with a single column.

To expand to multiple columns:

#Turn application into a list
applic_list<-unlist(apply(application, 1, list), recursive = FALSE)
#lapply through this list, using .SD to call the column in question
DT[,(paste(application$var,application$transform,sep="_")) :=
    lapply(applic_list,function(y)      {
      testFunction(as.numeric(y[["transform"]]),.SD[[y[["var"]]]])
    }),by="V1"]

returns

    V1 V2 V3 V4  V3_0.5  V3_0.9 V4_0.5
 1:  A  1 10  0 10.0000 10.0000      0
 2:  A  2 10  0 15.0000 11.0000      0
 3:  A  3  0  0  7.5000  1.1000      0
 4:  A  4  0  2  3.7500  0.1100      2
 5:  A  5  0  2  1.8750  0.0110      3
 6:  B  1  5  0  5.0000  5.0000      0
 7:  B  2 10  0 12.5000 10.5000      0
 8:  B  3  0  0  6.2500  1.0500      0
 9:  B  4  0  4  3.1250  0.1050      4
10:  B  5  0  4  1.5625  0.0105      6

Upvotes: 0

Chris
Chris

Reputation: 6372

Down to the core of your issue, you can pass a vector of parameters into lapply, and then create new columns by reference like this:

library(data.table)

DT <- data.table(col = 1:5)
expon <-  function(y,x){x ^ y}
params <- c(1,5,3)

DT[, (paste0("col_",params, sep = "")) := lapply(params, expon, col)]

This gives you:

   col col_1 col_5 col_3
1:   1     1     1     1
2:   2     2    32     8
3:   3     3   243    27
4:   4     4  1024    64
5:   5     5  3125   125

Upvotes: 4

Related Questions