Gastove
Gastove

Reputation: 916

How to pass a parameter by variable into data.table[J()]

I'm brand new to the (completely marvelous) data.table package, and seem to have gotten stuck on a very basic, somewhat bizarre problem. I can't post the exact data set I'm working with, for which I apologize -- but I think the problem is simple enough to articulate that hopefully this will still be very clear.

Let's say I have a data.table like so, with key x:

set1
   x y
1: 1 a
2: 1 b
3: 1 c
4: 2 a

I want to return a subset of set1 containing all rows where x == 1. This is wonderfully simple in data.table: set1[J(1)]. Bam. Done. I can also assign z <- 1, and call set1[J(z)]. Again: works great.

...except when I try to scale it up to my actual data set, which contains ~6M rows. When I call set1[J(1674)], I get back a 78-row return that's exactly what I'm looking for. But I need to be able to look up (literally) 4M of these subsets. When I assign the value I'm searching for to a variable, id <- 1674, and call set1[J(id)]... R nearly takes down my desktop.

Clearly something I don't understand is going on under the data.table hood, but I haven't been able to figure out what. Googling and slogging through Stack Overflow suggest that this should work. Out of pure whimsey, I've tried:

id <- quote(1674)
set1[J(eval(id))]

...but that is far, far worse. What... what's going on?

Upvotes: 2

Views: 610

Answers (1)

Matt Dowle
Matt Dowle

Reputation: 59612

[ @mnel beat me to it as I was writing ...]

Almost certainly, one column of set1 happens to be called "id"; i.e.,

isTRUE("id" %in% names(set1))

causing set1[J(id)] to self join set1$id to set1, ignoring the id in calling scope.

If so, there are several approaches to avoid scoping issues such as this :

.id = <your 4M ids>
set1[J(.id)]

or use the fact that a single name i is evaluated in calling scope :

JDT=J(id); set1[JDT]

or that eval is eval'd in calling scope, too :

set1[eval(J(id))]

or, we do want to make this clearer, more robust and easier, so one thought is to add .. :

set1[..(J(id))]     # .. alias for eval

or perhaps :

set1[J(..id)]

where .. borrows its meaning from the file system's .., meaning one-level-up. If the .. was a prefix to symbols, you could then do something like :

DT[colB==..id]

where == is used there for illustration. In that example colB is expected to be a column name and ..id will find id in calling scope (one level up). The thinking is that that would be quite clear to the reader of the code what the programmer intended.

Upvotes: 3

Related Questions