Reputation: 1117

Selecting rows based on multiple columns in R

I have a dataframe like this

 M2 <- matrix(c(1,0,0,1,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0),nrow=7, 
  dimnames=list(LETTERS[1:7],NULL))

I would like to select the rows based on multiple columns. For instance when I want to select rows based on only two columns, I did

 ans<-M2[which(M2[,1]==0 & M2[,2]==0)

But when I want to select only those rows which have value zero based on three or four columns, say based 1, 3, and 4 th column or say 1, 2,3, 4 , how do I do this?

Upvotes: 2

Answers (3)

Roland

Reputation: 132969

Just for fun a solution that works for a data.frame and could be used for a large number of columns:

DF <- as.data.frame(M2)
DF[rowSums(sapply(DF[,c(1,2,4)],`!=`,e2=0))==0,]
#  V1 V2 V3 V4
#B  0  0  0  0
#F  0  0  0  0
#G  0  0  0  0

What happens here?

sapply loops over the columns of the subset DF[,c(1,2,4)]. It applies the function != (not equal) to each column of the subset and compares with 0 (e2 is the second argument of the != function). The result is a matrix of logical values (TRUE/FALSE).
rowSums takes the sum of each row of this logical matrix. Logical values are automatically coerced to 1/0.
We then test if these row sums are 0 (i.e. all values in the row not unequal to 0).
The resulting logical vector is used for subsetting the rows.

Of course this is easier and faster with a matrix:

M2[rowSums(M2[,c(1,2,4)] != 0) == 0,]

Upvotes: 10

user1981275

Reputation: 13382

You could use rowSums:

M2[rowSums(M2[,c(1,2,3,4)]) == 0,]

gives you all rows where column 1,2,3 and 4 have a zero:

  [,1] [,2] [,3] [,4]
B    0    0    0    0
F    0    0    0    0
G    0    0    0    0

Please note that this won't work if you have positive and negative numbers in you matrix.

Upvotes: 4

Mayou

Reputation: 8848

Your question is not quite clear to me, but is this what you are looking for?

To select based on the values of columns 1 to 4, you will do the following:

ans <- M2[M2[,1]==0 & M2[,2]==0 & M2[,3]==0 & M2[,4]==0,]

 #> ans
 #  [,1] [,2] [,3] [,4]
 #B    0    0    0    0
 #F    0    0    0    0
 #G    0    0    0    0

This will result in the subset of M2 for which all columns 1 to 4 are zero.

Upvotes: 0

Selecting rows based on multiple columns in R

Answers (3)

Related Questions