Reputation: 31
I would like to know how to remove rows from a data frame that have fewer than (let's say 5) non-zero entries.
The closest I've come is:
length(which(df[1,] > 0)) >= 5
but how to apply this to the whole data frame and drop the ones that are FALSE? Is there a function similar to the COUNTIF() function in excel that I can apply here?
Thank you for your help.
Upvotes: 3
Views: 4248
Reputation: 4970
You can also use a for-loop.
We first create a matrix of zero's and one's to test our code. Row 2 has to be excluded because it has less than 5 non-zero values.
In the loop we count the number of non-zero values per row, and assign TRUE if this is less than 5 (FALSE otherwise). The vector named 'drop' holds the information for which row is TRUE then FALSE. In the final step, we exclude those rows for which drop==TRUE.
mat <- matrix(c(1,1,1,1,0,1,1,1,1,1,1,1,1,1,1), nrow=3, ncol=5)
mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 0 1 1 1
[3,] 1 1 1 1 1
drop <- NULL
for(i in 1:NROW(mat)){
count.non.zero <- sum(mat[i,]!=0, na.rm=TRUE)
drop <- c(drop, count.non.zero<5)
}
mat[!drop==TRUE,]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 1 1 1 1 1
NOTE: na.rm==TRUE
allows this script to work when your data contains missing values.
Upvotes: 0
Reputation: 7232
You can use boolean values in rowSums
and in [
:
df[ rowSums(df > 0) >= 5, ]
There are 3 steps hidden in this expression:
df > 0
produces a matrix with values TRUE where element > 0rowSums
returns number of nonzero elements for every line (when summing it treats values TRUE as 1 and FALSE as 0)[
selects only lines where the number of non-zero elements >= 5 Upvotes: 3