Reputation: 3919
I'm trying to learn data.table
package in R
. I have a data table named DT1
and a data frame DF1
, and I'd like to subset some instances according to a logical condition (disjunction). This is my code for now:
DF1[DF1$c1==0 | DF1$c2==1,] #the data.frame way with the data.frame DF1
DT1[DT1$c1==0 | DT1$c2==1,] #the data.frame way with the data.table DT1
On page 5 of "Introduction to the data.table package in R", the author gives an example of something similar but with a conjuction (replace |
by &
in the second line above) and remarks that's a bad use of data.table
package. He suggests todo it this way instead:
setkey(DT1,c1,c2)
DT1[J(0,1)]
So, my question is: How can I write the disjunction condition with the data.table
package syntax? Is it a misuse my second line DT1[DT1$c1==0 | DT1$c2==1,]
? Is there an equivalent to the J
but for disjunction?
Upvotes: 5
Views: 303
Reputation: 6884
Here is another solution:
grpsize = ceiling(1e7/26^2)
DT <- data.table(
x=rep(LETTERS,each=26*grpsize),
y=rep(letters,each=grpsize),
v=runif(grpsize*26^2))
setkey(DT, x)
system.time(DT1 <- DT[x=="A" | x=="Z"])
user system elapsed
0.68 0.05 0.74
system.time(DT2 <- DT[J(c("A", "Z"))])
user system elapsed
0.08 0.00 0.07
all.equal(DT1[, v], DT2[, v])
TRUE
Note that I took the example from the data.table document. The only difference is that I do not convert the letters into factors anymore because character keys are now allowed (see NEWS for v 1.8.0).
A short explanation: J
is just short for data.table
. So if you call J(0, 1)
you create a data.table
with two columns that match, just like in the example:
> J(0,1)
V1 V2
[1,] 0 1
You, however, want to match two different elements in one column. Therefore, you need a data.table
with one column. So just add c()
.
J(c(0,1))
V1
[1,] 0
[2,] 1
Upvotes: 3
Reputation: 263481
That document indicates that you could have used:
DT1[c1==0 | c2==1, ]
Upvotes: 4