Fabian Gehring
Fabian Gehring

Reputation: 1173

filter data.table for !is.na()

As far as I know it should be avoided to use "&" and "|" in i to avoid vector scans. Therefore:

data<-data.table(a=c(NA, 1, 2), b=c(1, 2, 1), key="a,b")
data[is.na(a) & b==1]

should be replaced by

data[.(NA_integer_, 1)]

But: When I'm interesed in all non-NA entries how should I do that? Is this ok to use the following or does it use slower vector scans?

data[!is.na(a) & b==1]

because something like this does not seem to work

data[.(!NA_integer_, 1)]

Upvotes: 4

Views: 2688

Answers (1)

Arun
Arun

Reputation: 118879

Unfortunately, it's not possible to have expressions of the form you require in binary search based subsets currently.. i.e., we can not negate on individual key columns.

The way to perform a binary search based subset at the moment would be:

require(data.table) ## v1.9.5+
a_val = setdiff(unique(data$a), NA)
setkey(data)[.(a_val, 1), nomatch=0L]
#    a b
# 1: 2 1

May be it'd be nice to have a function, for example, not() or except() that'd allow us to extract the values internally... care to file a FR here?

Upvotes: 4

Related Questions