Reputation: 131
I am running a for loop with two matrices. One matrix(A)
has ~100 strings (such as, name1, name2, ..., name100) and only has one column. The other matrix(B)
is bigger than A
with rows and columns of both values and strings. In some places in B
matrix, each name of A
matrix is matched. I would like to extract and stack matched entire rows with a particular string of matrix A
on output matrix.
So, I am running as below,
output <- NULL
for(K in 1:nrow(A)){
print(K)
for(cc in 1:nrow(B)){
for(dd in 1:ncol(B)){
if(toupper(A[K])==toupper(B[cc,dd])){
output <- rbind(output,B[cc,])
}
}
}
}
But it is too slow. How do you make this for loop more efficient in terms of running time?
Upvotes: 1
Views: 203
Reputation: 40871
Here's a fast solution that should give the same output as yours:
set.seed(13)
A <- matrix(letters[1:5])
B <- matrix(sample(letters, 12, rep(T)), 4)
x <- match(toupper(A), toupper(B), nomatch=0L)
x <- (x[x>0L]-1L) %% nrow(B) + 1L
output <- B[x, , drop=FALSE]
It works by using match
to find the (vector) indices in B where A matches. It then converts those indices to row indices, and finally extracts those rows.
..Note that the row B[2,]
is included twice in the output - is that really what you want? If not, change the last line to:
output <- B[unique(x), , drop=FALSE]
EDIT Some timings. I removed the toupper
calls since that dominates the times, and @Manuel Ramon didn't call it. Note that all our outputs are different! So some debugging is probably warranted ;-)
# Create huge A and B matrices
set.seed(13)
strs <- outer(letters, LETTERS, paste)
A <- matrix(strs)
B <- matrix(sample(strs, 1e7, rep(T)), 1e4)
# My solution: 0.24 secs
system.time({
x <- match(A, B, nomatch=0L)
x <- (x[x>0L]-1L) %% nrow(B) + 1L
output1 <- B[unique(x), , drop=FALSE]
})
# @DWin's solution: 0.91 secs
system.time({
idx <- unique(which(as.matrix(B) %in% A, arr.ind=TRUE) %% NROW(B))
idx[idx==0] <- 4
output2 <- B[idx, , drop=FALSE]
})
# @Manuel Ramon's solution: 0.89 secs
system.time({
id <- apply(B, 2, function(x) A %in% x)
output3 <- B[apply(id,1,sum)>0, ]
})
Upvotes: 4
Reputation: 263481
The speed problem is not because of the for-loop. apply
will probably be even slower. You need to pre-dimension your target-object and assign values with indexing.
Or you need to think of a vectorized solution like ... works on Manuel's test case:
idx <- unique(which(toupper(as.matrix(B)) %in% toupper(A), arr.ind=TRUE) %% NROW(B))
idx[idx==0] <- 4
B[idx , ]
z1 z2 z3
1 a 1.5623285 a
4 c -1.2196311 f
2 g 0.2551535 b
Upvotes: 5
Reputation: 2498
Here some idea:
A <- matrix(c('a','b','c','d'), ncol=1)
B <- data.frame(z1=c('a','g','f','c'), z2=rnorm(4), z3=c('a','b','f','f'))
id <- apply(B, 2, function(x) A %in% x)
newB <- B[apply(id,1,sum)>0, ]
Upvotes: 0