R: Force data.table to compute all interactions

Question

Here is a data.table:

dat = data.table(var1=rnorm(120), var2=rep(c('a','b','c'),40), var3=rep(c(1,2,3,2,1,2,1,2,2,3,1,2),10))

dat2 = dat[,list(resp = mean(var1)),by=list(var2, var3)]

In dat2, only existing interactions of dat$var2 et dat$var3 are present. How can I force dat2 to contain results for all 9 possible interactions (instead of the 7 rows of dat2) for var2 and var3? If there is no direct solutions with data.table, what is the easiest way to solve this issue?

table(dat$var2, dat$var3)

     1  2  3
  a 20 10 10
  b 20 20  0
  c  0 30 10

Of course, for the interactions where no data exist in dat, dat2 should contain NA in resp.

Simon O&#39;Hanlon · Accepted Answer

You could set the key and then do a crossjoin using CJ in the i like so...

setkey( dat , var2 , var3 )

# Thanks to @Shadow for pointing out to use unique() in the cross join
dat[ CJ( unique(var2) , unique(var3) ) , mean(var1) ]
#   var2 var3          V1
#1:    a    1 -0.25771923
#2:    a    2  0.04143057
#3:    a    3  0.28878451
#4:    b    1  0.18865887
#5:    b    2  0.53632552
#6:    b    3          NA
#7:    c    1          NA
#8:    c    2  0.38015021
#9:    c    3  0.49809159

And by way of explanation, CJ() creates a data.table in the i of x (in this case dat) to join on. It is formed as the cross product of the vectors supplied to CJ(), which happens to be precisely what you are looking for!

R: Force data.table to compute all interactions

Answers (1)

Related Questions