Reputation: 555
Problem: Suppose I have the following data.table
object. I want to apply the following filter condition:
(CMT_1 != "") | (CMT_2 != "") | (CMT_3 != "")
As there can be more CMT_*
and I do not know them a-prioi I want to make this condition flexible (depending on how many CMT_*
columns I have). Any suggestions how to write this nicely?
library(data.table)
dt <- data.table(
CMT_1 = 1:3,
CMT_2 = 4:6,
CMT_4 = 8:10,
remainder1 = 12:14,
remainder2 = 15:17
)
cmts <- names(dt)[startsWith(names(dt), "CMT_")]
## filter condition which I want to make flexible
dt[(CMT_1 != "") | (CMT_2 != "") | (CMT_3 != ""))
Upvotes: 0
Views: 305
Reputation: 34763
You can build the condition programmatically, then evaluate it:
cols = grep('^CMT_', names(dt), value = TRUE)
cond = Reduce(
function(x, y) call('|', x, y),
lapply(cols, function(col) call('nzchar', as.name(col)))
)
dt[eval(cond)]
building an expression with call
and name
can take some getting used to, it might be easier to use str2lang
more directly here:
cond = str2lang(
paste(sprintf('nzchar(%s)', cols), collapse = ' | ')
)
cond
# nzchar(CMT_1) | nzchar(CMT_2) | nzchar(CMT_4)
Or you can evaluate the columns one by one and Reduce
using |
alone:
Reduce(
'|',
lapply(cols, function(col) nzchar(dt[[col]]))
)
The disadvantage of the rowSums
approach is matrix conversion -- your whole table has to be converted to matrix first. nzchar
is also a faster version of x != ''
.
Upvotes: 1
Reputation: 25223
Here is one option using rowSums
and .I
to extract those rows before subsetting:
cmts <- grep("^CMT_", names(dt), value=TRUE)
dt[dt[, .I[rowSums(.SD!="") > 1L], .SDcols=cmts]]
Upvotes: 0