geotheory
geotheory

Reputation: 23670

Efficiently and elegantly check which rows of a matrix exist in another

Say we want to check which rows in a matrix (or dataframe) exist in another. All the solutions I've found to this surely basic operation seem to either require a library (this {data.table} 4-liner) or are verbose and obscure, such as:

(m1 = matrix(1:10, ncol=2))
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
(m2 = matrix(c(1,3,4,-1,6,7,9,8), ncol=2))
     [,1] [,2]
[1,]    1    6
[2,]    3    7
[3,]    4    9
[4,]   -1    8

# ugh!
rowSums(outer(m2[,1], m1[,1], "==") & outer(m2[,2], m1[,2], "==")) != 0
[1]  TRUE FALSE  TRUE FALSE

Does anyone know a more elegant method using base functions, with equivalent efficiency to this example? (NB. apply() is not as efficient..)

Upvotes: 0

Views: 135

Answers (1)

DSquare
DSquare

Reputation: 2468

From this answer you can use match:

> m1 = matrix(1:10, ncol=2)
> m2 = matrix(c(1,3,4,-1,6,7,9,8), ncol=2)
> m<-match(data.frame(t(m1)), data.frame(t(m2)))
> m
[1]  1 NA NA  3 NA

You can easily change the result of match to suit your preferred format:

> !is.na(m)
[1]  TRUE FALSE FALSE  TRUE FALSE
> which(!is.na(m))
[1] 1 4

You can retrieve the rows using any of those variants:

> m1[!is.na(m),]
     [,1] [,2]
[1,]    1    6
[2,]    4    9

But if you actually wanted that and not the indexes, just use merge (returns a data.frame):

> merge(m1, m2)
  V1 V2
1  1  6
2  4  9

Upvotes: 2

Related Questions