Reputation: 7800

How to filter rows for every column independently using dplyr

I have the following tibble:

library(tidyverse)
df <- tibble::tribble(
  ~gene, ~colB, ~colC,
  "a",   1,  2,
  "b",   2,  3,
  "c",   3,  4,
  "d",   1,  1
)

df
#> # A tibble: 4 x 3
#>    gene  colB  colC
#>   <chr> <dbl> <dbl>
#> 1     a     1     2
#> 2     b     2     3
#> 3     c     3     4
#> 4     d     1     1

What I want to do is to filter every columns after gene column for values greater or equal 2 (>=2). Resulting in this:

gene, colB, colC
a   NA   2
b   2    3
c   3    4

How can I achieve that?

The number of columns after genes actually is more than just 2.

Upvotes: 2

Answers (4)

jkatam

Reputation: 3457

Alternatively we could also try the below code

df %>% rowwise %>% 
filter(any(c_across(starts_with('col'))>=2)) %>% 
mutate(across(starts_with('col'), ~ifelse(!(.>=2), NA, .)))

^{Created on 2023-02-05 with reprex v2.0.2}

# A tibble: 3 × 3
# Rowwise: 
  gene   colB  colC
  <chr> <dbl> <dbl>
1 a        NA     2
2 b         2     3
3 c         3     4

Upvotes: 0

akrun

Reputation: 887961

We can use data.table

library(data.table)
setDT(df)[df[, Reduce(`|`, lapply(.SD, `>=`, 2)), .SDcols = colB:colC]
   ][, (2:3) := lapply(.SD, function(x) replace(x, x < 2, NA)), .SDcols = colB:colC][]
#   gene colB colC
#1:    a   NA    2
#2:    b    2    3
#3:    c    3    4

Or with melt/dcast

dcast(melt(setDT(df), id.var = 'gene')[value>=2], gene ~variable)
#   gene colB colC
#1:    a   NA    2
#2:    b    2    3
#3:    c    3    4

Upvotes: 0

alistaire

Reputation: 43364

The forthcoming dplyr 0.6 (install from GitHub now, if you like) has filter_at, which can be used to filter to any rows that have a value greater than or equal to 2, and then na_if can be applied similarly through mutate_at, so

df %>% 
    filter_at(vars(-gene), any_vars(. >= 2)) %>% 
    mutate_at(vars(-gene), funs(na_if(., . < 2)))
#> # A tibble: 3 x 3
#>    gene  colB  colC
#>   <chr> <dbl> <dbl>
#> 1     a    NA     2
#> 2     b     2     3
#> 3     c     3     4

or similarly,

df %>% 
    mutate_at(vars(-gene), funs(na_if(., . < 2))) %>% 
    filter_at(vars(-gene), any_vars(!is.na(.)))

which can be translated for use with dplyr 0.5:

df %>% 
    mutate_at(vars(-gene), funs(na_if(., . < 2))) %>% 
    filter(rowSums(is.na(.)) < (ncol(.) - 1))

All return the same thing.

Upvotes: 5

neilfws

Reputation: 33812

One solution: convert from wide to long format, so you can filter on just one column, then convert back to wide at the end if required. Note that this will drop genes where no values meet the condition.

library(tidyverse)
df %>% 
gather(name, value, -gene) %>% 
  filter(value >= 2) %>% 
  spread(name, value)

# A tibble: 3 x 3
   gene  colB  colC
* <chr> <dbl> <dbl>
1     a    NA     2
2     b     2     3
3     c     3     4

Upvotes: 5

How to filter rows for every column independently using dplyr

Answers (4)

Related Questions