Reputation: 5169
I have a function that returns more than one value. I need to use it in ddply
but I want to avoid calling the function multiple times. Here's a mock-up example:
library(plyr)
ff = function(i) {
return(c(min(i),max(i)))
}
set.seed(12345)
id = c(rep(1:3,4))
x = sample(1:10, 12, replace=T)
df = data.frame(id,x)
res = ddply(df,.(id),summarise,val1 = min(x), val2 = max(x), val3 = ff(x)[1], val4 = ff(x)[2])
View(res)
id val1 val2 val3 val4
1 1 4 10 4 10
2 2 1 9 1 9
3 3 2 8 2 8
As expected, val3 = val1
, and val4 = val2
. But I have to call function ff
two times in ddply
, which is not optimal time-wise. Is there a way to assign val
within ddply
with both function outputs in one access? If I try to use [1:2]
or similar, I get an error: Error in eval(expr, envir, enclos) : length(rows) == 1 is not TRUE
Thanks!
Edit. Thanks to all contributors! David's solution worked ~2 times faster. And it allows one to do further operations with intermediate results. Here's an updated code that is fully reproducible.
library(plyr)
library(data.table)
library(microbenchmark)
ff = function(i) {
return(c(min(i),max(i)))
}
set.seed(12345)
id = c(rep(1:3,4000))
x = runif(12000,1,10)
df = data.frame(id,id2,x)
View(df)
res = ddply(df,.(id),summarise,val1 = min(x), val2 = max(x), val3 = ff(x)[1], val4 = ff(x)[2], val5 = val3+val4, val6 = val3/val4)
View(res)
res2 = setDT(df)[, as.list(c(val1 = min(x), val2 = max(x), val3 = ff(x))), .(id)][, val5 := val31+val32][, val6 := val31/val32]
View(res2)
print(microbenchmark(ddply(df,.(id),summarise,val1 = min(x), val2 = max(x), val3 = ff(x)[1], val4 = ff(x)[2], val5 = val3+val4, val6 = val3/val4), times = 100))
print(microbenchmark(setDT(df)[, as.list(c(val1 = min(x), val2 = max(x), val3 = ff(x))), .(id)][, val5 := val31+val32][, val6 := val31/val32],times=100))
Results:
Unit: milliseconds
expr
ddply(df, .(id), summarise, val1 = min(x), val2 = max(x), val3 = ff(x)[1], val4 = ff(x)[2], val5 = val3 + val4, val6 = val3/val4)
min lq mean median uq max neval
3.042616 3.185358 5.976851 3.409828 3.925104 45.5157 100
Unit: milliseconds
expr
setDT(df)[, as.list(c(val1 = min(x), val2 = max(x), val3 = ff(x))), .(id)][, `:=`(val5, val31 + val32)][, `:=`(val6, val31/val32)]
min lq mean median uq max neval
1.968349 2.071747 2.285368 2.124206 2.251171 12.62967 100
Upvotes: 3
Views: 1551
Reputation: 263332
If you construct your function to return a named vector, then data.table will accept it and populate the columns with those names retruning the desired structure:
require(data.table)
ff = function(i) {
return(c(val3=min(i),val4=max(i)))
}
setDT(df)[, as.list(c(var1 = min(x), var2 = max(x), ff(x))), id]
#-----------
id var1 var2 val3 val4
1: 1 4 10 4 10
2: 2 1 9 1 9
3: 3 2 8 2 8
Upvotes: 1