Reputation: 1122
I have two matrixes, one is from experiment (df1) and another is reference (df2). They are semi-quantitative values from specimens, from 1 to 50. I would like to compare each rows of df1 from experiment whether the values are all True (as same as) to the reference.
df1:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 6 14 32 38 40 48
[2,] 1 12 17 20 36 47
[3,] 7 15 29 33 40 42
[4,] 7 13 28 33 35 48
[5,] 1 2 13 36 38 41
[6,] 12 20 37 38 41 48
[7,] 13 14 28 34 36 43
...more rows
df2:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 5 12 14 15 24 32
[2,] 4 5 13 22 34 47
[3,] 1 14 24 29 34 36
[4,] 7 13 28 33 35 48
[5,] 13 14 28 34 36 43
[6,] 4 10 13 17 29 30
[7,] 4 15 22 30 36 43
[8,] 1 11 18 36 41 48
[9,] 14 17 18 24 43 47
[10,] 13 24 32 34 41 47
...more rows
desired output:
V1 V2 V3 V4 V5 V6 V7
7 13 28 33 35 48 TRUE
13 14 28 34 36 43 TRUE
How can I compare all the rows of a matrix with another matrix to sort all TRUE rows? Thanks.
Upvotes: 1
Views: 92
Reputation: 729
An alternative method using for()
which()
and %in%
:
# For reproducibility these random matrices usually have >1 match for example
# Run again if not.
data1 <- matrix(sample(c(0,1),60, replace = TRUE),ncol = 5)
data2 <- matrix(sample(c(0,1),60, replace = TRUE),ncol = 5)
# You can use some 'helper' character string vectors
data1.str <- apply(data1, 1, paste0, collapse="")
data2.str <- apply(data2, 1, paste0, collapse="")
data.match <- c()
for(i in 1:length(data1.str)){
data.match <- append(data.match, which(data1.str %in% data2.str[i]))
}
# Gives your matched rows already
data1[data.match,]
# For completeness to give desired output:
matched <- as.data.frame(data1)
matched$data.match <- rep(FALSE,nrow(matched))
matched$data.match[data.match] <- TRUE
> matched[which(matched$data.match == TRUE),]
V1 V2 V3 V4 V5 data.match
4 1 1 0 0 1 TRUE
6 0 1 1 1 1 TRUE
7 1 1 0 0 0 TRUE
9 0 0 0 0 0 TRUE
10 0 1 0 0 1 TRUE
Upvotes: 1
Reputation: 11140
Here's one way of doing this -
x <- matrix(1:4, nrow=2)
[,1] [,2]
[1,] 1 3
[2,] 2 4
y <- matrix(c(1,2,5,4), nrow=2)
[,1] [,2]
[1,] 1 5
[2,] 2 4
do.call(paste, as.data.frame(x)) %in% do.call(paste, as.data.frame(y))
FALSE TRUE
I am guessing this should be faster than doing inner_join
by all columns.
Upvotes: 1