R data.table: select rows based on variables whose names are stored elsewhere

Question

Consider the following:

library(data.table)

DataTableA <- data.table(v1 = c(1, 2, NA, 6, 3, NA),
                         v2 = c(NA, 4, NA, NA, 1, 2),
                         v3 = c(3, 3, NA, 4, 2, NA),
                         v4 = c(2, NA, 3, NA, 3, NA),
                         v5 = c(1, NA, NA, NA, 3, 4))

DataTableA

##    v1 v2 v3 v4 v5
## 1:  1 NA  3  2  1
## 2:  2  4  3 NA NA
## 3: NA NA NA  3 NA
## 4:  6 NA  4 NA NA
## 5:  3  1  2  3  3
## 6: NA  2 NA NA  4

varnames <- c("v2", "v4", "v5")

What is the best way of getting the rows of DataTableA where at least one of the variables named in varnames is not NA, without explicitly referring to the variable names?

I know I could do

DataTableA[!is.na(v2) | !is.na(v4) | !is.na(v5)]

but I want to avoid writing out the variable names.

Something that works is

DataTableA[apply(!is.na(DataTableA[, ..varnames]), 1, any)]

but I'm wondering if there's a better way. If there's not, that's OK of course. I don't have any problem with using apply as above, but what I've seen of data.table so far makes me think there might be a simpler approach.

This question is similar, but more complex.

Thanks for any help you can give.

akrun · Accepted Answer

We can use specify the 'varnames' in .SDcols, loop over the .SD (Subset of Data.table), apply the function and Reduce

DataTableA[DataTableA[, Reduce(`|`, lapply(.SD, is.na)), .SDcols = varnames]]

Or with rowSums

DataTableA[DataTableA[, rowSums(!is.na(.SD)) > 0, .SDcols = varnames]]

R data.table: select rows based on variables whose names are stored elsewhere

Answers (1)

Related Questions