nhern121
nhern121

Reputation: 3919

How would you translate this into data.table package language in R?

I'm trying to learn data.table package in R. I have a data table named DT1 and a data frame DF1, and I'd like to subset some instances according to a logical condition (disjunction). This is my code for now:

DF1[DF1$c1==0 | DF1$c2==1,] #the data.frame way with the data.frame DF1
DT1[DT1$c1==0 | DT1$c2==1,] #the data.frame way with the data.table DT1

On page 5 of "Introduction to the data.table package in R", the author gives an example of something similar but with a conjuction (replace | by & in the second line above) and remarks that's a bad use of data.table package. He suggests todo it this way instead:

setkey(DT1,c1,c2)
DT1[J(0,1)]

So, my question is: How can I write the disjunction condition with the data.table package syntax? Is it a misuse my second line DT1[DT1$c1==0 | DT1$c2==1,]? Is there an equivalent to the J but for disjunction?

Upvotes: 5

Views: 303

Answers (2)

Christoph_J
Christoph_J

Reputation: 6884

Here is another solution:

grpsize = ceiling(1e7/26^2)
DT <- data.table(
  x=rep(LETTERS,each=26*grpsize),
  y=rep(letters,each=grpsize),
  v=runif(grpsize*26^2))

setkey(DT, x)
system.time(DT1 <- DT[x=="A" | x=="Z"])
   user  system elapsed 
   0.68    0.05    0.74 
system.time(DT2 <- DT[J(c("A", "Z"))])
   user  system elapsed 
   0.08    0.00    0.07 
all.equal(DT1[, v], DT2[, v])
TRUE

Note that I took the example from the data.table document. The only difference is that I do not convert the letters into factors anymore because character keys are now allowed (see NEWS for v 1.8.0).

A short explanation: J is just short for data.table. So if you call J(0, 1) you create a data.table with two columns that match, just like in the example:

> J(0,1)
     V1 V2
[1,]  0  1

You, however, want to match two different elements in one column. Therefore, you need a data.table with one column. So just add c().

J(c(0,1))
     V1
[1,]  0
[2,]  1

Upvotes: 3

IRTFM
IRTFM

Reputation: 263481

That document indicates that you could have used:

DT1[c1==0 | c2==1, ]

Upvotes: 4

Related Questions