Gustavo
Gustavo

Reputation: 49

Filtering columns of a data frame according to another column

I need to filter the data frame below according to the number of samples each otu occurs in.

   samples otu1 otu2 otu3 otu4 otu5
1        a    2    1    0    0    3
2        b    2    4    1    4    3
3        c    0    0    0    1    0
4        d    0    0    1    4    4
5        e    1    2    0    2    3
6        f    1    1    2    4    2
7        g    1    0    0    4    3
8        h    0    0    2    0    4
9        i    1    2    2    1    6
10       j    0    0    2    3    4

For example, to keep only the otus that occur in >=80% of the samples, the output would be like:

   samples otu4 otu5
1        a    0    3
2        b    4    3
3        c    1    0
4        d    4    4
5        e    2    3
6        f    4    2
7        g    4    3
8        h    0    4
9        i    1    6
10       j    3    4

Upvotes: 0

Views: 40

Answers (1)

akrun
akrun

Reputation: 886998

We can use select

library(dplyr)
df1 %>% 
    select(samples, where(~ is.numeric(.) && mean(. != 0) >= 0.8))

-output

#     samples otu4 otu5
#1        a    0    3
#2        b    4    3
#3        c    1    0
#4        d    4    4
#5        e    2    3
#6        f    4    2
#7        g    4    3
#8        h    0    4
#9        i    1    6
#10       j    3    4

Or if we are using an older dplyr version, use select_if

df1 %>%
   select_if(~ is.character(.)|is.numeric(.) && mean(. != 0) >= 0.8)

data

df1 <- structure(list(samples = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j"), otu1 = c(2L, 2L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 
0L), otu2 = c(1L, 4L, 0L, 0L, 2L, 1L, 0L, 0L, 2L, 0L), otu3 = c(0L, 
1L, 0L, 1L, 0L, 2L, 0L, 2L, 2L, 2L), otu4 = c(0L, 4L, 1L, 4L, 
2L, 4L, 4L, 0L, 1L, 3L), otu5 = c(3L, 3L, 0L, 4L, 3L, 2L, 3L, 
4L, 6L, 4L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))

Upvotes: 2

Related Questions