rafa.pereira
rafa.pereira

Reputation: 13807

Select ALL Columns | Rows of a data.table based on condition

Two related questions:

EDIT: ps. I'm looking for solutions based on data.table.

1. How to select the rows of data.table above a certain threshold for all columns?

2. How to select the columns of data.table which contain values above a certain threshold ?

Reproducible example:

library(data.table)
dt <- data.table(V1=1:5, V2=3:7, V3=7:3)

conditionally selecting all rows

# this line selects rows based on column `V1`. 
  dt[ V1 > 2, ] 

# I'm looking for a way to select rows based on values of all columns. My failed attempt
  dt[ names(dt) > 2, ] 

# *expected output*: a data.table with all columns but only with those rows where all values are `> 2` 

#> V1 V2 V3
#> 3  5  5
#> 4  6  4
#> 5  7  3

conditionally selecting all columns

# My failed attempt
  dt[, .SD, .SDcols > 2 ]

# *expected output*: a data.table with all rows but only with those columns where all values are `> 2`

#>   V2 V3
#>   3  7
#>   4  6
#>   5  5
#>   6  4
#>   7  3

Upvotes: 4

Views: 16863

Answers (3)

jangorecki
jangorecki

Reputation: 16697

Optionally, much more complex solution than rowMeans but gives more flexibility. Using lhs.all helper function to recycle expression for all fields in LHS of provided expression.

library(data.table)
dt = data.table(V1=1:5, V2=3:7, V3=7:3)

lhs.all = function(pseudo.expr) {
    sub.pseudo.expr = substitute(pseudo.expr)
    stopifnot(is.call(sub.pseudo.expr), is.character(cols <- eval.parent(sub.pseudo.expr[[2L]])))
    l.expr = lapply(cols, function(x) {
        sub.expr=sub.pseudo.expr
        sub.expr[[2L]] = as.name(x)
        sub.expr
    })
    Reduce(function(a, b) bquote(.(a) & .(b)), l.expr)
}
lhs.all(names(dt) > 2)
#V1 > 2 & V2 > 2 & V3 > 2
dt[eval(lhs.all(names(dt) > 2))]
#   V1 V2 V3
#1:  3  5  5
#2:  4  6  4
#3:  5  7  3

Upvotes: 0

CJB
CJB

Reputation: 1809

For subsetting rows, the following code uses base R. Because you are looking across rows, you are treating the data table more like a matrix.

rws <- apply(dt, 1L, function(r) any(r > 4))
dt[rws]

For columns, you can use the list-like properties of the data table again:

cls <- sapply(dt, function(c) any(c > 4))
dt[, cls, with = FALSE]

Upvotes: 1

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

For getting all columns but only rows where all values are superior to a treshold, the best way is to use classic filtering as:

dt[rowMeans(dt>threshold)==1,]

For getting all rows but only columns where all values are superior to a treshold, you can do:

dt[,colMeans(dt>threshold)==1, with=F]

Upvotes: 2

Related Questions