Reputation: 13807
Two related questions:
EDIT: ps. I'm looking for solutions based on data.table
.
1. How to select the rows of data.table
above a certain threshold for all columns?
2. How to select the columns of data.table
which contain values above a certain threshold ?
Reproducible example:
library(data.table)
dt <- data.table(V1=1:5, V2=3:7, V3=7:3)
# this line selects rows based on column `V1`.
dt[ V1 > 2, ]
# I'm looking for a way to select rows based on values of all columns. My failed attempt
dt[ names(dt) > 2, ]
# *expected output*: a data.table with all columns but only with those rows where all values are `> 2`
#> V1 V2 V3
#> 3 5 5
#> 4 6 4
#> 5 7 3
# My failed attempt
dt[, .SD, .SDcols > 2 ]
# *expected output*: a data.table with all rows but only with those columns where all values are `> 2`
#> V2 V3
#> 3 7
#> 4 6
#> 5 5
#> 6 4
#> 7 3
Upvotes: 4
Views: 16863
Reputation: 16697
Optionally, much more complex solution than rowMeans
but gives more flexibility. Using lhs.all
helper function to recycle expression for all fields in LHS of provided expression.
library(data.table)
dt = data.table(V1=1:5, V2=3:7, V3=7:3)
lhs.all = function(pseudo.expr) {
sub.pseudo.expr = substitute(pseudo.expr)
stopifnot(is.call(sub.pseudo.expr), is.character(cols <- eval.parent(sub.pseudo.expr[[2L]])))
l.expr = lapply(cols, function(x) {
sub.expr=sub.pseudo.expr
sub.expr[[2L]] = as.name(x)
sub.expr
})
Reduce(function(a, b) bquote(.(a) & .(b)), l.expr)
}
lhs.all(names(dt) > 2)
#V1 > 2 & V2 > 2 & V3 > 2
dt[eval(lhs.all(names(dt) > 2))]
# V1 V2 V3
#1: 3 5 5
#2: 4 6 4
#3: 5 7 3
Upvotes: 0
Reputation: 1809
For subsetting rows, the following code uses base R. Because you are looking across rows, you are treating the data table more like a matrix.
rws <- apply(dt, 1L, function(r) any(r > 4))
dt[rws]
For columns, you can use the list-like properties of the data table again:
cls <- sapply(dt, function(c) any(c > 4))
dt[, cls, with = FALSE]
Upvotes: 1
Reputation: 31161
For getting all columns but only rows where all values are superior to a treshold, the best way is to use classic filtering as:
dt[rowMeans(dt>threshold)==1,]
For getting all rows but only columns where all values are superior to a treshold, you can do:
dt[,colMeans(dt>threshold)==1, with=F]
Upvotes: 2