Reputation: 396
I have a variable list of column names in a data.table, and would like to apply the same filter to all of them to subset the rows of my table (i.e. give me the rows where all the columns in the list are >= 5)
DT = data.table(
ID = c("b","b","b","a","a","c"),
a = 1:6,
b = 7:12,
c = 13:18
)
> DT
ID a b c
1: b 1 7 13
2: b 2 8 14
3: b 3 9 15
4: a 4 10 16
5: a 5 11 17
6: c 6 12 18
cols= c("a", "b", "c")
DT[.SD >= 5, , .SDcols=cols] # something like this?
ID a b c
1: a 5 11 17
2: c 6 12 18
Upvotes: 1
Views: 1783
Reputation: 39
You could just use .SDcols = is.numeric
if you were looking at all the numeric columns. This saves one step.
Upvotes: 0
Reputation: 887118
If we need to have the conditions met for all the columns, create a list
of logical vectors and then Reduce
it to a single logical vector
DT[DT[, Reduce(`&`, lapply(.SD, `>=`, 5)), .SDcols = cols]]
# ID a b c
#1: a 5 11 17
#2: c 6 12 18
Or another option with rowSums
DT[ DT[, rowSums(.SD >= 5) == length(cols), .SDcols = cols]]
NOTE: Both options are vectorized and are efficient
DT1 <- DT[rep(seq_len(nrow(DT)), 1e6)]
system.time(DT1[ DT1[, rowSums(.SD >= 5) == length(cols), .SDcols = cols]])
# user system elapsed
# 0.464 0.127 0.555
system.time(DT1[DT1[, Reduce(`&`, lapply(.SD, `>=`, 5)), .SDcols = cols]])
# user system elapsed
# 0.134 0.022 0.150
system.time(DT1[ DT1[, apply(.SD >= 5, 1, all), .SDcols=cols], ])
# user system elapsed
# 6.636 0.087 6.687
Upvotes: 4
Reputation: 12713
DT[ DT[, apply(.SD >= 5, 1, all), .SDcols=cols], ]
# ID a b c
# 1: a 5 11 17
# 2: c 6 12 18
Upvotes: 2