Reputation: 195
I need to remove all columns with more than 1 zero. I am currently using df <- df[, colSums(df != 0) > 1]
but this does not work for all of the columns with many zeros. How can this be fixed or approached a different way?
> tibble(df)
# A tibble: 551 x 1,046
`aa` `ab` `ac` `ad` `ae` `af`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 32458 65068 32654 0 43115 1450
2 19387 38457 19447 0 22523 958
3 42690 85105 43247 0 14156 1088
4 62290 123325 61878 58422 36300 1145
Upvotes: 2
Views: 74
Reputation: 886938
We may use select
to select columns where the mean
of the logical expression i.e those elements that are 0 are less than 0.7
library(dplyr)
df %>%
select(where(~ mean(. %in% 0) < 0.7))
-output
aa ab ac ae af
1 32458 65068 32654 43115 1450
2 19387 38457 19447 22523 958
3 42690 85105 43247 14156 1088
4 62290 123325 61878 36300 1145
If it is to remove columns with more than 1 zero value
df %>%
select(where( ~sum(. %in% 0) < 2))
-output
aa ab ac ae af
1 32458 65068 32654 43115 1450
2 19387 38457 19447 22523 958
3 42690 85105 43247 14156 1088
4 62290 123325 61878 36300 1145
Or a similar option in base R
Filter(function(x) mean(x %in% 0) < 0.7, df)
aa ab ac ae af
1 32458 65068 32654 43115 1450
2 19387 38457 19447 22523 958
3 42690 85105 43247 14156 1088
4 62290 123325 61878 36300 1145
or using sum
for count of zeros
Filter(function(x) sum(x %in% 0) < 2, df)
df <- structure(list(aa = c(32458L, 19387L, 42690L, 62290L), ab = c(65068L,
38457L, 85105L, 123325L), ac = c(32654L, 19447L, 43247L, 61878L
), ad = c(0L, 0L, 0L, 58422L), ae = c(43115L, 22523L, 14156L,
36300L), af = c(1450L, 958L, 1088L, 1145L)),
class = "data.frame", row.names = c("1",
"2", "3", "4"))
Upvotes: 1
Reputation: 101064
Maybe you can try colMeans
like below
df[colMeans(df == 0) < 0.7]
Upvotes: 3