R: Return ids where no entries for row match criteria

Question

This seems like a fairly simple operation but I can't figure out how to do it or search for it.

I have data similar to the following:

id,status
123,0
123,1
124,1
125,1
125,2
126,0
126,2

I want a list of ids that don't have any rows where the status is 2. So in the example above, the operation would return:

123
124

I believe that in sql it might be something like this:

SELECT DISTINCT id
FROM user
WHERE id NOT IN (
SELECT DISTINCT id
FROM user
WHERE status='2')
)

But I'm not sure how to do that in R.

Frank · Accepted Answer

If you put it into a data.table:

require(data.table)
mydt <- data.table(user)

Then the syntax works like this:

mydt[
  ,
  list(keep=all(status!=2)),
by=id][
  (keep),
  id
]

That is, FROM[WHERE,SELECT,BY][WHERE2,SELECT2,BY2]...[WHEREK,SELECTK,BYK]. After each set of brackets, you have a new data.table to work with for the next query in the chain.

After you install the data.table package, you can type vignette("datatable-faq") and browse to question 2.16 for details on the analogy between SQL syntax and what's available in data.table.

R: Return ids where no entries for row match criteria

Answers (1)

Related Questions