Discus23
Discus23

Reputation: 521

Create all cross combinations, respecting the column hierarchy

From my data.table DT.in, I want to create all cross join combinations, but not only CJ(_all_my_var_).

I need to perform something like this :

CJ(var1)
CJ(var1, var2)
CJ(var1, var2, var3)
...

Here is my reprex :

library(data.table)
set.seed(999)

DT.in <- data.table(lvl1 = rep('AA', 200),
                 code = sample(c('D44', 'J21'), 200, replace = TRUE),
                 var = sample(c('Z3R', 'TR5', 'JKL', 'FR5', 'TFX'), 200, replace = TRUE),
                 test = sample(c('ONE', 'TWO', 'THREE', 'FOUR', 'FIVE', 'SIX', 'SEVEN'), 200, replace = TRUE))

foo.1 <- DT.in[, .(new = CJ(lvl1, unique = TRUE))]
foo.2 <- DT.in[, .(new = CJ(lvl1, code, unique = TRUE))]
foo.3 <- DT.in[, .(new = CJ(lvl1, code, var, unique = TRUE))]
foo.4 <- DT.in[, .(new = CJ(lvl1, code, var, test, unique = TRUE))]

DT.out <- rbindlist(list(foo.1, foo.2, foo.3, foo.4), fill = TRUE)

My DT.out is what I want, but of course, I need to automate this, the name and the number of variable are never the same. And I still can't do it ...

Any help to do this will be very appreciated.

(Maybe there is a whole other way to go from DT.in to DT.out ...?)

Upvotes: 1

Views: 115

Answers (1)

akrun
akrun

Reputation: 887881

We can use do.call with CJ

  1. Loop over the sequence of columns of 'DT.in' with lapply
  2. Get the sequence, to select the columns based on the numeric index
  3. Apply the CJ in do.call
  4. Bind the output datasets in the list to a single one with rbindlist
library(data.table)
DT.out2 <- rbindlist(lapply(seq_along(DT.in), function(i) 
      do.call(CJ, c(DT.in[, seq(i), with = FALSE], unique = TRUE))), fill = TRUE)
setnames(DT.out2, paste0("new.", names(DT.out2)))

-checking with OP's output

identical(DT.out, DT.out2)
[1] TRUE

Upvotes: 5

Related Questions