statquant
statquant

Reputation: 14370

Mixing by and .SDcols in data.table

I am trying to mix by and .SDcols in data.table cran 1.9.6 (and also tested on dev from github, so it is likely a misundertanding on my part)

f = function(x){                                   
    print(x);                                      
    res=data.table(X=x,Y=x*x);                     
    return(res)                                    
}                                                  
DT = data.table(x=1:4, y=rep(c('a','b'),2))        
DT[,c('A','B'):=lapply(.SD,FUN=f),.SDcols='x',by=y]

I get:

[1] 1 3
Error in `[.data.table`(DT, , `:=`(c("A", "B"), lapply(.SD, FUN = f)),  : 
  All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.

I would expect

   x y A B
1: 1 a 1 1
2: 2 b 2 4
3: 3 a 3 9
4: 4 b 4 16

I would have expected the by operation to take place and SDcols to be replace by 'x' Could someone explain why I am wrong here ?

Upvotes: 0

Views: 1687

Answers (1)

statquant
statquant

Reputation: 14370

All the following works, as @Frank pinpointed, the problem was in the level nesting of the list by lapply

DT[,f(.SD[[1]]),.SDcols='x',by=y]
   y X  Y
1: a 1  1
2: a 3  9
3: b 2  4
4: b 4 16

DT[,lapply(.SD, f)[[1]],.SDcols='x',by=y]
   y X  Y
1: a 1  1
2: a 3  9
3: b 2  4
4: b 4 16

DT[,rbindlist(lapply(.SD, f)),.SDcols='x',by=y]
   y X  Y
1: a 1  1
2: a 3  9
3: b 2  4
4: b 4 16

DT[,sapply(.SD, f),.SDcols='x',by=y]

   y V1 V2
1: a  1  1
2: a  3  9
3: b  2  4
4: b  4 16

DT[,mapply(FUN=f, mget('x')),by=y]
   y V1 V2
1: a  1  1
2: a  3  9
3: b  2  4
4: b  4 16

Upvotes: 2

Related Questions