Riccardo
Riccardo

Reputation: 763

How to subset a data table according to several conditions

I have a question which is giving me unexpected troubles. The question, which seems imbarassingly simple, is that I would like to subset a data.table according to a few conditions. Let's consider this example; imagine I have a data.table like the one below. Now, I would like to extrapolate from it a subset such that a>=1045 and a<=1180 and (b=="c" or b=="d").

dt<-data.table(a=(1000:1200), b=sample(letters[1:3], size=201, replace=T))

I read the data.table guides and FAQ and searched the forum, but I don't seem to find anything which is relevant. I apologize in advance if you think the question is meaningless.

Upvotes: 2

Views: 136

Answers (2)

MichaelChirico
MichaelChirico

Reputation: 34703

More data.table-y solution (after first defining id as as.factor(1000:1200)), i.e. taking advantage of binary search:

setkey(dt,a,b)
dt[.(rep(paste0(1045:1180),2),rep(c("b","c"),each=136)),nomatch=0L]

(we have to specify nomatch because otherwise we'll just return everything--it would be easier to tell the difference if you had a third variable, which would come back as NA for unmatched key combinations)

(of course if you cast a as numeric, we don't need the paste0 bit)


An even cleaner way of doing this (without having to specify the recycling ourselves, basically) is with the CJ function:

setkey(dt,a,b)
dt[CJ(paste0(1045:1180),c("c","d")),nomatch=0L]

Upvotes: 1

akrun
akrun

Reputation: 887118

You can try

dt[a>=1045 & a <=1180 & b %chin% c('c', 'd')]

Or as @David Arenburg commented, between can be used

dt[between(a, 1045, 1180) & b %chin% c('c', 'd')]

Upvotes: 3

Related Questions