Reputation: 185
Consider the following:
library(data.table)
DataTableA <- data.table(v1 = c(1, 2, NA, 6, 3, NA),
v2 = c(NA, 4, NA, NA, 1, 2),
v3 = c(3, 3, NA, 4, 2, NA),
v4 = c(2, NA, 3, NA, 3, NA),
v5 = c(1, NA, NA, NA, 3, 4))
DataTableA
## v1 v2 v3 v4 v5
## 1: 1 NA 3 2 1
## 2: 2 4 3 NA NA
## 3: NA NA NA 3 NA
## 4: 6 NA 4 NA NA
## 5: 3 1 2 3 3
## 6: NA 2 NA NA 4
varnames <- c("v2", "v4", "v5")
What is the best way of getting the rows of DataTableA
where at least one of the variables named in varnames
is not NA, without explicitly referring to the variable names?
I know I could do
DataTableA[!is.na(v2) | !is.na(v4) | !is.na(v5)]
but I want to avoid writing out the variable names.
Something that works is
DataTableA[apply(!is.na(DataTableA[, ..varnames]), 1, any)]
but I'm wondering if there's a better way. If there's not, that's OK of course. I don't have any problem with using apply
as above, but what I've seen of data.table so far makes me think there might be a simpler approach.
This question is similar, but more complex.
Thanks for any help you can give.
Upvotes: 1
Views: 251
Reputation: 887108
We can use specify the 'varnames' in .SDcols
, loop over the .SD
(Subset of Data.table), apply the function and Reduce
DataTableA[DataTableA[, Reduce(`|`, lapply(.SD, is.na)), .SDcols = varnames]]
Or with rowSums
DataTableA[DataTableA[, rowSums(!is.na(.SD)) > 0, .SDcols = varnames]]
Upvotes: 2