aurelius_37809
aurelius_37809

Reputation: 195

Delete columns with several zeros

I need to remove all columns with more than 1 zero. I am currently using df <- df[, colSums(df != 0) > 1] but this does not work for all of the columns with many zeros. How can this be fixed or approached a different way?

> tibble(df)
# A tibble: 551 x 1,046
            `aa`           `ab`           `ac`         `ad`         `ae`         `af` 
            <dbl>          <dbl>          <dbl>        <dbl>        <dbl>        <dbl>        
 1          32458          65068          32654        0            43115         1450         
 2          19387          38457          19447        0            22523          958         
 3          42690          85105          43247        0            14156         1088         
 4          62290         123325          61878        58422        36300         1145  

Upvotes: 2

Views: 74

Answers (2)

akrun
akrun

Reputation: 886938

We may use select to select columns where the mean of the logical expression i.e those elements that are 0 are less than 0.7

library(dplyr)
df %>%
    select(where(~ mean(. %in% 0) < 0.7))

-output

    aa     ab    ac    ae   af
1 32458  65068 32654 43115 1450
2 19387  38457 19447 22523  958
3 42690  85105 43247 14156 1088
4 62290 123325 61878 36300 1145

If it is to remove columns with more than 1 zero value

df %>%
   select(where( ~sum(. %in% 0) < 2))

-output

   aa     ab    ac    ae   af
1 32458  65068 32654 43115 1450
2 19387  38457 19447 22523  958
3 42690  85105 43247 14156 1088
4 62290 123325 61878 36300 1145

Or a similar option in base R

 Filter(function(x) mean(x %in% 0) < 0.7, df)
     aa     ab    ac    ae   af
1 32458  65068 32654 43115 1450
2 19387  38457 19447 22523  958
3 42690  85105 43247 14156 1088
4 62290 123325 61878 36300 1145

or using sum for count of zeros

Filter(function(x) sum(x %in% 0) < 2, df)

data

df <- structure(list(aa = c(32458L, 19387L, 42690L, 62290L), ab = c(65068L, 
38457L, 85105L, 123325L), ac = c(32654L, 19447L, 43247L, 61878L
), ad = c(0L, 0L, 0L, 58422L), ae = c(43115L, 22523L, 14156L, 
36300L), af = c(1450L, 958L, 1088L, 1145L)),
 class = "data.frame", row.names = c("1", 
"2", "3", "4"))

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

Maybe you can try colMeans like below

df[colMeans(df == 0) < 0.7]

Upvotes: 3

Related Questions