How to select columns based on range/value contained in rows?

Question

I'm attempting to select the columns (variables) where the values in the rows meet a particular condition, e.g. within a specified range.

My data:

    structure(list(X1 = c(35L, 45L, 67L, 890L, 23L, 4L, 5L, 34L, 
78L, 12L, 3L), X2 = c(100L, 2345L, 3467L, 235L, 46L, 76L, 32L, 
 17L, 346L, 789L, 32L), X3 = c(345L, 67L, 45L, 342L, 67L, 31L, 
6L, 78L, 9L, 34L, 23L), X4 = c(23L, 45L, 657L, 231L, 456L, 342L, 
56L, 44567L, 3211L, 2342L, 2L)), row.names = c(NA, -11L), class = c("data.table", 
 "data.frame"))

As a simple example just finding one value:

      out <- dt[, lapply(.SD, function(x) which(x == 5))]

The above produces a table with all columns present, and with NA values

        X1     X2    X3   X4
        7      NA    4    NA

I can get rid of these NA values through doing this, just to get the number of columns I want.

       out2 <- out %>% select_if(~ !any(is.na(.)))

      X1    X3
      7     4

Questions:

is there a way I can do this using data.table with one line of code?
how would I specify a range of values e.g ( > 5 & < 1000)
Would it be better to transpose and have variables in rows and use something like dplyr filter_all?

Any help or points in the right direction would be appreciated.

Andre Wildberg · Accepted Answer

Selecting columns with sapply and with=FALSE

Specific value:

dt[, sapply(dt, function(x) any(x == 342)), with=FALSE]
     X3    X4
 1: 345    23
 2:  67    45
 3:  45   657
 4: 342   231
 5:  67   456
 6:  31   342
 7:   6    56
 8:  78 44567
 9:   9  3211
10:  34  2342
11:  23     2

Range:

dt[, sapply(dt, function(x) all(x > 5 & x < 1000)), with=FALSE]
     X3
 1: 345
 2:  67
 3:  45
 4: 342
 5:  67
 6:  31
 7:   6
 8:  78
 9:   9
10:  34
11:  23

How to select columns based on range/value contained in rows?

Answers (2)

Related Questions