Reputation: 3656
I have the data frame:
mat=data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))
The question is: how do I filter out columns with an excessive amount of zeros [e.g. > 50%]? E.g. column B would have to be removed.
It would be great to set a threshold with nrow(mat) * 0.5 and then remove columns that have a zero count above that threshold value.
Upvotes: 2
Views: 2221
Reputation: 174813
Here is one way:
> mat <- data.frame(A=c(12,10,0,14,0,60),B=c(0,0,0,0,13,65))
>
> keep <- (colSums(mat > 0) / nrow(mat)) > 0.5
> keep
A B
TRUE FALSE
>
> mat[, keep, drop = FALSE]
A
1 12
2 10
3 0
4 14
5 0
6 60
Upvotes: 5