António Mendes
António Mendes

Reputation: 183

Filtering data.table based on values in group of variables

I ran into a problem with finding the best syntax to filter out cases where I want to filter all rows where a group of variables has at least one value non null.

An example is below:

> dat <- data.table(a=1:5, b=c(1:3, NA, NA), c=c(NA, 1:3, NA))

> cols <- c('b', 'c')

> dat[!all(is.na(cols)), .SD, with=FALSE]
Null data.table (0 rows and 0 cols)

> dat[!is.na(b)|!is.na(c), .SD]
   a  b  c
1: 1  1 NA
2: 2  2  1
3: 3  3  2
4: 4 NA  3

As you can see if I explicitly say each variable name as !is.na(variable1) | !is.na(variable2) it works however I can't find a way to include a group of variables so I can do it in 1 condition only and not concatenate everything with or.

Upvotes: 1

Views: 669

Answers (2)

Santosh M.
Santosh M.

Reputation: 2454

You could also do this.

dat[rowSums(!is.na(dat[, cols, with=FALSE])) > 0,]
    a  b  c
#1: 1  1 NA
#2: 2  2  1
#3: 3  3  2
#4: 4 NA  3

Upvotes: 0

talat
talat

Reputation: 70256

You can use the following syntax with rowSums and .SD:

dat[dat[, rowSums(!is.na(.SD)) > 0, .SDcols  = cols]]
#   a  b  c
#1: 1  1 NA
#2: 2  2  1
#3: 3  3  2
#4: 4 NA  3

The inner part creates a logical value that looks like this:

dat[, rowSums(!is.na(.SD)) > 0, .SDcols  = cols]
# [1]  TRUE  TRUE  TRUE  TRUE FALSE

Re the comment by Michael, you can also use Reduce + lapply:

dat[dat[, Reduce("+", lapply(.SD, function(x) !is.na(x))) > 0, .SDcols = cols]]

But for most of my use cases, the rowSums approach is ok and easier to read, imo.

Upvotes: 3

Related Questions