Reputation: 310
Create a dataset and the function I want to use
library(data.table)
DT <- data.table(V1=c(rep("A",5),rep("B",5)),
V2=rep(1:5,2),
V3=c(10,10,0,0,0,5,10,0,0,0),
V4=c(0,0,0,2,2,0,0,0,4,4))
testFunction<-function(x,transformation){
l<-length(x)
out <- rep(0, l)
out[1] <- x[1]
for (i in 2:l) {
#out[i] <- x[i] + (1 - transformation) * x[i - 1] #EDIT: Function was wrong
out[i] <- x[i] + (1 - transformation) * out[i - 1]
}
return(out)
}
Now What I want to do is create a new dataset, newDT, using the information from the below application data.frame
application<-data.frame(var=c("V3","V3","V4"),
transform=c(0.5,0.9,0.5))
The code I want to end up with from this function is as follows: Creating new variables using the variable names and transformations in the application, and doing this by column V1.
newDT<-DT[,':='(V3_0.5=testFunction(V3,0.5),
V3_0.9=testFunction(V3,0.9),
V4_0.5=testFunction(V4,0.5)),
by="V1"]
It is simple enough to code this up as text using a couple of paste functions, and then passing this to eval(parse(text=....)):
application$code<-paste(application$var,"_",application$transform,"=testFunction(",application$var,",",application$transform,")",sep="")
code<-paste("newDT<-DT[,':='(",paste(application$code,collapse=","),"),by='V1']")
eval(parse(text=code))
however that runs into an issue when you pass over 4076 characters in the string ( (a) No idea why and (b) is not recommended all over the Runiverse).
The question: How do I go about this?
Happy to look at alternative solutions such as dplyr if speed isn't affected.
EDIT: The output table should look as following
V1 V2 V3 V4 V3_0.5 V3_0.9 V4_0.5
1: A 1 10 0 10.0000 10.0000 0
2: A 2 10 0 15.0000 11.0000 0
3: A 3 0 0 7.5000 1.1000 0
4: A 4 0 2 3.7500 0.1100 2
5: A 5 0 2 1.8750 0.0110 3
6: B 1 5 0 5.0000 5.0000 0
7: B 2 10 0 12.5000 10.5000 0
8: B 3 0 0 6.2500 1.0500 0
9: B 4 0 4 3.1250 0.1050 4
10: B 5 0 4 1.5625 0.0105 6
Upvotes: 0
Views: 2227
Reputation: 310
Thanks to Chris for providing me with a step in the right direction, with a solution that will work with a single column.
To expand to multiple columns:
#Turn application into a list
applic_list<-unlist(apply(application, 1, list), recursive = FALSE)
#lapply through this list, using .SD to call the column in question
DT[,(paste(application$var,application$transform,sep="_")) :=
lapply(applic_list,function(y) {
testFunction(as.numeric(y[["transform"]]),.SD[[y[["var"]]]])
}),by="V1"]
returns
V1 V2 V3 V4 V3_0.5 V3_0.9 V4_0.5
1: A 1 10 0 10.0000 10.0000 0
2: A 2 10 0 15.0000 11.0000 0
3: A 3 0 0 7.5000 1.1000 0
4: A 4 0 2 3.7500 0.1100 2
5: A 5 0 2 1.8750 0.0110 3
6: B 1 5 0 5.0000 5.0000 0
7: B 2 10 0 12.5000 10.5000 0
8: B 3 0 0 6.2500 1.0500 0
9: B 4 0 4 3.1250 0.1050 4
10: B 5 0 4 1.5625 0.0105 6
Upvotes: 0
Reputation: 6372
Down to the core of your issue, you can pass a vector of parameters into lapply, and then create new columns by reference like this:
library(data.table)
DT <- data.table(col = 1:5)
expon <- function(y,x){x ^ y}
params <- c(1,5,3)
DT[, (paste0("col_",params, sep = "")) := lapply(params, expon, col)]
This gives you:
col col_1 col_5 col_3
1: 1 1 1 1
2: 2 2 32 8
3: 3 3 243 27
4: 4 4 1024 64
5: 5 5 3125 125
Upvotes: 4