Reputation: 51
In a 100x100 matrix populated only with integers, I am performing pairwise comparisons WITHIN each row beginning with (and including) element 1,1. For any comparison that is TRUE, I tally a +1 at the corresponding element in another preallocated matrix [x] (this is just a similarity matrix).
Using nested for loops, this operation requires N*(N-1)/2 + N comparisons for each row. On my machine, the code below doesn't take too long but is there a better (ok, faster and more elegant) way to this? I have considered a vectorized calculation using "apply" but as of yet, with no joy.
result <- matrix( round(rnorm(10000,sample(5))), ncol=100)
x <-matrix(data=0, nrow=100,ncol=100)
system.time(
for (i in 1:100) {
for (j in 1:100) {
for (k in j:100) {
if (result[i,][j] == result[i,][k]) {
x[j,][k] = x[j,][k] + 1
}
}
}
}
)
user system elapsed
6.586 0.599 7.192
Here's a small example:
"result" matrix
[,1] [,2] [,3] [,4]
[1,] 1 6 1 1
[2,] 6 1 5 3
[3,] 1 5 4 4
[4,] 2 3 4 2
structure(c(1, 6, 1, 2, 6, 1, 5, 3, 1, 5, 4, 4, 1, 3, 4, 2), .Dim = c(4L,4L))
After the code application, I expect in the x matrix:
[,1] [,2] [,3] [,4]
[1,] 4 0 1 2
[2,] 0 4 0 0
[3,] 0 0 4 2
[4,] 0 0 0 4
Upvotes: 3
Views: 540
Reputation: 89107
This is about 100 times faster (50ms) on my machine using your 100-by-100 result
matrix:
for (i in 1:ncol(result))
for (j in i:ncol(result))
x[i, j] <- sum(result[, i] == result[, j])
And this is about 200 times faster, but maybe a bit harder to understand:
x <- apply(result, 2, function(y)colSums(result == y))
x[lower.tri(x)] <- 0
If it is still not fast enough for your taste, I would look if this exact function is not already implemented in one of the many distance packages, or try a Rcpp
implementation. Although I'm not sure you'll get a lot more out, as my suggestions already use a fair amount of vectorization.
Upvotes: 5