Reputation: 715
I have a dataframe:
df = data.frame(gene = c("a", "b", "c", "d", "e"),
value1 = c(NA, NA, NA, 2, 1),
value2 = c(NA, 1, 2, 3, 4),
value3 = c(NA, NA, NA, NA, 1))
I would like to keep all those columns (plus the first, gene) with more than or equal to atleast 2 valid values (i.e., not NA). How do I do this?
I am thinking something like this ...
df1 = df %>% select_if(function(.) ...)
Thanks
Upvotes: 1
Views: 41
Reputation: 886968
We can sum
the non-NA elements and create a logical condition to select
the columns of interest
library(dplyr)
df1 <- df %>%
select_if(~ sum(!is.na(.)) > 2)
df1
# gene value2
#1 a NA
#2 b 1
#3 c 2
#4 d 3
#5 e 4
Or another option is keep
library(purrr)
keep(df, ~ sum(!is.na(.x)) > 2)
Or create the condition based on the number of rows
df %>%
select_if(~ mean(!is.na(.)) > 0.5)
Or use Filter
from base R
Filter(function(x) sum(!is.na(x)) > 2, df)
Upvotes: 4
Reputation: 388817
We can use colSums
in base R to count the non-NA value per column
df[colSums(!is.na(df)) > 2]
# gene value2
#1 a NA
#2 b 1
#3 c 2
#4 d 3
#5 e 4
Or using apply
df[apply(!is.na(df), 2, sum) > 2]
Upvotes: 2