Reputation: 67
In the example below, I have a function (meansd
) that returns the mean of a vector and optionally returns the standard deviation as well. When combining [.data.table
, the .
/list
syntax in j
and by
the ideal result that is long on by, but wide on metric (e.g. the mean and sd returned by the function) as shown in the dcast
result.
Is it possible to get the result of the dcast
call directly from a call to [.data.table
similar to a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]
(and therefore skip the dcast step)? Or does the error from the last example (combining transpose and by) mean I'm out of luck?
library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))
}
#Presumable syntax:
a[, .(meansd(var1, T), meansd(var1, T)), by = bvar] #data are long on metric which is not ideal
#> bvar V1 V2
#> 1: 1 32.8 32.8
#> 2: 1 51.8575 51.8575
#> 3: 2 6 6
#> 4: 2 4.690416 4.690416
#What I was hoping the output would be (dcast call results):
ideal = a[, .(meansd(var1, T), meansd(var1, T)), by = bvar]
ideal[, metric := rep(c('mean', 'sd'),2)]
#ideally the output would look something like this, without the need for the dcast:
dcast(ideal,bvar~metric, value.var = c('V1','V2')) #names are nice, but unimportant
#> bvar V1_mean V1_sd V2_mean V2_sd
#> 1: 1 32.8 51.8575 32.8 51.8575
#> 2: 2 6 4.690416 6 4.690416
Some related things I've tried:
#some success when only operating on one column:
a[, .(meansd(var1, F))] #returned vector becomes a data.table per `.()` syntax
#> V1
#> 1: 19.4
a[, .(meansd(var1, T))] #why is this long? Is it possible to be wide? Why does the inclusion of `.()` change it?
#> V1
#> 1: 19.4
#> 2: 37.47651
a[, .(t(meansd(var1, T)))] #transpose works alright
#> mean sd
#> 1: 19.4 37.47651
a[, .(t(meansd(var1, T))), by = bvar] #by does not like a transpose
#> Error in `[.data.table`(a, , .(t(meansd(var1, T))), by = bvar): All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
Upvotes: 0
Views: 49
Reputation: 17715
If you concatenate your two calls to meansd
into a single vector and then transpose it, you'll get 4 separate columns.
Then you just need to clean up by renaming your columns:
library('data.table')
a = data.table(var1 = c(1,2,3,10,123,12,31,4,6,2), bvar = 1:2)
meansd = function(x, sd = TRUE){
if(!sd) return(mean(x))
return(list(mean = mean(x), sd = sd(x)))
}
a = a[, t(c(meansd(var1, TRUE), meansd(var1, TRUE))), by=bvar]
colnames(a) = c("bvar", "V1_mean", "V1_sd", "V2_mean", "V2_sd")
a
#> bvar V1_mean V1_sd V2_mean V2_sd
#> 1: 1 32.8 51.857497 32.8 51.857497
#> 2: 2 6.0 4.690416 6.0 4.690416
Upvotes: 1