Ari B. Friedman
Ari B. Friedman

Reputation: 72741

Retaining by variables in R data.table by-without-by

I'd like to retain the by variables in a by-without-by operation using data.table.

I have a by-without-by that used to work (ca. 2 years ago), and now with the latest version of data.table I think the behavior must have changed.

Here's a reproducible example:

library(data.table)
dt <- data.table( by1 = letters[1:3], by2 = LETTERS[1:3], x = runif(3) )
by <- c("by1","by2")
allPermutationsOfByvars <- do.call(CJ, sapply(dt[,by,with=FALSE], unique, simplify=FALSE)) ## CJ() to form index
setkeyv(dt, by)
dt[ allPermutationsOfByvars, list( x = x ) ]

Which produces:

> dt[ allPermutationsOfByvars, list( x = x ) ]
           x
1: 0.9880997
2:        NA
3:        NA
4:        NA
5: 0.4650647
6:        NA
7:        NA
8:        NA
9: 0.4899873

I could just do:

> cbind( allPermutationsOfByvars, dt[ allPermutationsOfByvars, list( x = x ) ] )
   by1 by2         x
1:   a   A 0.9880997
2:   a   B        NA
3:   a   C        NA
4:   b   A        NA
5:   b   B 0.4650647
6:   b   C        NA
7:   c   A        NA
8:   c   B        NA
9:   c   C 0.4899873

Which indeed works, but is inelegant and possibly inefficient.

Is there an argument I'm missing or a clever stratagem to retain the by variables?

Upvotes: 3

Views: 118

Answers (1)

eddi
eddi

Reputation: 49448

Add by = .EACHI to get the "by-without-by" aka by-EACH-element-of-I:

dt[allPermutationsOfByvars, x, by = .EACHI]

And this is how I'd have done the initial part:

allPermutationsOfByvars = dt[, do.call(CJ, unique(setDT(mget(by))))]

Finally, the on argument is usually the better choice now (vs setkey).

Upvotes: 6

Related Questions