Reputation: 131
Suppose I have an m x n matrix M1 and an k x l matrix M2 with l <= n. I want to find all those rows in M1 that contain some row of M2 in it.
For example consider the following situation:
> M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M2 <- matrix(c(1,3,8,9), nrow = 2, ncol = 2, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 9
Then rows one and three of M1 fulfill the condition, as row one contains 1 and 3 and the last row 8 and 9.
So how to achieve this in an efficient way? I have written code using loops, but as I am working with very large matrices this solution takes to much time.
Upvotes: 1
Views: 557
Reputation: 1610
More general example:
M1 <- matrix(c(1,2,3,1,2,3,4,5,6,7,8,9,10,11,12,13,1,2), nrow = 6, ncol = 3, byrow = TRUE)
M2 <- matrix(c(1,2,6,9, 10,11,16,17, 19, 2), nrow = 5, ncol =2, byrow = TRUE)
First, use match
to find indexes of the matching values in M1
.
ind <- match(M1, M2)
Now, using the mod operator %%
with the indexes and the number of rows, you'll find the rows. This works because the indexes will always be the M2
row plus the total number of rows, so numbers in the same row will return the same result.
rows <- ind %% nrow(M2)
Then, m
is a matrix containing row number of matching values between M1
and M2
. Lines will only be selected if the same index appear in the same row 2 times (or, more generally, the number of times equal to the number of columns in M2
). This assures that a row of M1
is only be considered if it contains all elements of a row in M2
.
m <- matrix(rows, nrow = nrow(M1))
matchRows <- apply(m, 1, duplicated, incomparables = NA)
M1rows <- which(colSums(matchRows)==ncol(M2)-1)
Upvotes: 1
Reputation: 9313
This method will check each row from M2 and will return the row index from M1 if it is contained or NA in case it is not
M1 <- matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3, byrow = TRUE)
> M1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
M2 <- matrix(c(1,3,8,5,4,5,1,2), nrow = 4, ncol = 2, byrow = TRUE)
> M2
[,1] [,2]
[1,] 1 3
[2,] 8 5
[3,] 4 5
[4,] 1 2
y = apply(M2,1,function(x){
z = unique(which(M1 %in% x)%%nrow(M1))
ifelse(length(z)==1,ifelse(z==0,nrow(M1),z),NA)
})
> y
[1] 1 NA 2 1
This means that row 2 from M2 is not in M1, and that rows 1 and 4 from M2 are in row 1 in M1. Also row 3 in M2 is in row 2 in M1.
Upvotes: 1