StefanH
StefanH

Reputation: 131

Find rows of matrix which contain rows of another matrix

Suppose I have an m x n matrix M1 and an k x l matrix M2 with l <= n. I want to find all those rows in M1 that contain some row of M2 in it.

For example consider the following situation:

> M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M2 <- matrix(c(1,3,8,9), nrow = 2, ncol = 2, byrow = TRUE)
> M1
      [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    4    5    6
 [3,]    7    8    9
> M2
      [,1] [,2]
 [1,]    1    3
 [2,]    8    9

Then rows one and three of M1 fulfill the condition, as row one contains 1 and 3 and the last row 8 and 9.

So how to achieve this in an efficient way? I have written code using loops, but as I am working with very large matrices this solution takes to much time.

Upvotes: 1

Views: 557

Answers (2)

Paulo MiraMor
Paulo MiraMor

Reputation: 1610

More general example:

M1 <- matrix(c(1,2,3,1,2,3,4,5,6,7,8,9,10,11,12,13,1,2), nrow = 6, ncol = 3, byrow = TRUE)
M2 <- matrix(c(1,2,6,9, 10,11,16,17, 19, 2), nrow = 5, ncol =2, byrow = TRUE)

First, use match to find indexes of the matching values in M1.

ind <- match(M1, M2)

Now, using the mod operator %% with the indexes and the number of rows, you'll find the rows. This works because the indexes will always be the M2 row plus the total number of rows, so numbers in the same row will return the same result.

rows <- ind %% nrow(M2)

Then, m is a matrix containing row number of matching values between M1 and M2. Lines will only be selected if the same index appear in the same row 2 times (or, more generally, the number of times equal to the number of columns in M2). This assures that a row of M1 is only be considered if it contains all elements of a row in M2.

m <- matrix(rows, nrow = nrow(M1))
matchRows <- apply(m, 1, duplicated, incomparables = NA)
M1rows <- which(colSums(matchRows)==ncol(M2)-1)

Upvotes: 1

R. Schifini
R. Schifini

Reputation: 9313

This method will check each row from M2 and will return the row index from M1 if it is contained or NA in case it is not

M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M1
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

M2 <- matrix(c(1,3,8,5,4,5,1,2), nrow = 4, ncol = 2, byrow = TRUE)
> M2
     [,1] [,2]
[1,]    1    3
[2,]    8    5
[3,]    4    5
[4,]    1    2

y = apply(M2,1,function(x){
  z = unique(which(M1 %in% x)%%nrow(M1))
  ifelse(length(z)==1,ifelse(z==0,nrow(M1),z),NA)
})

> y
[1]  1 NA  2  1

This means that row 2 from M2 is not in M1, and that rows 1 and 4 from M2 are in row 1 in M1. Also row 3 in M2 is in row 2 in M1.

Upvotes: 1

Related Questions