Reputation: 183
I ran into a problem with finding the best syntax to filter out cases where I want to filter all rows where a group of variables has at least one value non null.
An example is below:
> dat <- data.table(a=1:5, b=c(1:3, NA, NA), c=c(NA, 1:3, NA))
> cols <- c('b', 'c')
> dat[!all(is.na(cols)), .SD, with=FALSE]
Null data.table (0 rows and 0 cols)
> dat[!is.na(b)|!is.na(c), .SD]
a b c
1: 1 1 NA
2: 2 2 1
3: 3 3 2
4: 4 NA 3
As you can see if I explicitly say each variable name as !is.na(variable1) | !is.na(variable2)
it works however I can't find a way to include a group of variables so I can do it in 1 condition only and not concatenate everything with or
.
Upvotes: 1
Views: 669
Reputation: 2454
You could also do this.
dat[rowSums(!is.na(dat[, cols, with=FALSE])) > 0,]
a b c
#1: 1 1 NA
#2: 2 2 1
#3: 3 3 2
#4: 4 NA 3
Upvotes: 0
Reputation: 70256
You can use the following syntax with rowSums
and .SD
:
dat[dat[, rowSums(!is.na(.SD)) > 0, .SDcols = cols]]
# a b c
#1: 1 1 NA
#2: 2 2 1
#3: 3 3 2
#4: 4 NA 3
The inner part creates a logical value that looks like this:
dat[, rowSums(!is.na(.SD)) > 0, .SDcols = cols]
# [1] TRUE TRUE TRUE TRUE FALSE
Re the comment by Michael, you can also use Reduce
+ lapply
:
dat[dat[, Reduce("+", lapply(.SD, function(x) !is.na(x))) > 0, .SDcols = cols]]
But for most of my use cases, the rowSums
approach is ok and easier to read, imo.
Upvotes: 3