Henk
Henk

Reputation: 3656

R count values larger than zero in data frame columns

I have the data frame:

mat=data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))

The question is: how do I filter out columns with an excessive amount of zeros [e.g. > 50%]? E.g. column B would have to be removed.

It would be great to set a threshold with nrow(mat) * 0.5 and then remove columns that have a zero count above that threshold value.

Upvotes: 2

Views: 2221

Answers (1)

Gavin Simpson
Gavin Simpson

Reputation: 174813

Here is one way:

> mat <- data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))
> 
> keep <- (colSums(mat > 0) / nrow(mat)) > 0.5
> keep
    A     B 
 TRUE FALSE 
> 
> mat[, keep, drop = FALSE]
   A
1 12
2 10
3  0
4 14
5  0
6 60

Upvotes: 5

Related Questions