Reputation: 851
My function works, but it is very slow when I have large dataset.
What can I do to speed it up? I know we should avoid using double for-loop, but I don't know why.
Thanks so much!
n <- 3
wr <- c(0.9, 0.6, 0.5)
mat <- matrix(1:9, nrow=3, byrow=TRUE)
tmp <- matrix(nrow = n, ncol = n)
out <- rep(0, n)
colsum <- apply(mat, 2, sum)
for (i in 1:n) {
for (j in 1:n) {
tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
}
}
for (i in 1:n) {
out[i] <- 1-sum(tmp[1:n,i])
}
Upvotes: 2
Views: 153
Reputation: 5798
Base R one-liner:
1-colSums(t(t(prop.table(mat, 2)) * (1 - wr)))
Or using sweep:
1-rowSums(t(sweep(mat, 2, "/", STATS = colSums(mat))) * (1 - wr))
Data:
n <- 3
wr <- c(0.9, 0.6, 0.5)
mat <- matrix(1:9, nrow=3, byrow=TRUE)
Upvotes: 1
Reputation: 7734
Using apply
can speed things up here:
colsum <- apply(mat, 2, sum)
1 - rowSums(apply(mat, 1, function(x) (x / colsum)*(1-wr)))
We can clearly see a difference in with the microbenchmark
package and using a larger n
:
n <- 1000
wr <- rep(c(0.9, 0.6, 0.5), length.out=n)
mat <- matrix(1:(n^2), nrow=n, byrow=TRUE)
tmp <- matrix(nrow = n, ncol = n)
out <- rep(0, n)
colsum <- apply(mat, 2, sum)
microbenchmark(
for_loops = {
for (i in 1:n) {
for (j in 1:n) {
tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
}
}
for (i in 1:n) {
out[i] <- 1-sum(tmp[1:n,i])
}},
apply = {
out = 1 - rowSums(apply(mat, 1, function(x) (x / colsum)*(1-wr)))
},
transpose = {
tmp = t(t(mat) / colsum * (1-wr))
out = 1 - colSums(tmp)
},
rowSums = {
1 - rowSums(t(mat) / colsum * (1-wr))
}
)
Interestingly the transpose approach by @BellmanEqn seems to be faster than using apply, but using rowSums
as proposed by @user20650 instead of a second transpose even tops that on average.
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# for_loops 198.6269 211.68075 246.55071 220.3864 239.66485 476.6462 100 c
# apply 21.7299 23.98720 39.97067 29.9156 33.85995 232.0723 100 b
# transpose 11.1222 11.66100 23.86154 13.6034 19.52560 271.2242 100 a
# rowSums 8.6790 9.32655 14.09392 10.0072 15.18220 171.8077 100 a
Upvotes: 2
Reputation: 799
Try this:
n <- 3
wr <- c(0.9, 0.6, 0.5)
mat <- matrix(1:9, nrow=3, byrow=TRUE)
tmp <- matrix(nrow = n, ncol = n)
out <- rep(0, n)
colsum <- apply(mat, 2, sum)
for (i in 1:n) {
for (j in 1:n) {
tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
}
}
for (i in 1:n) {
out[i] <- 1-sum(tmp[1:n,i])
}
# alternatively:
tmp2 = t(t(mat) / colsum * (1-wr))
out2 = 1 - colSums(tmp)
> tmp
[,1] [,2] [,3]
[1,] 0.008333333 0.05333333 0.08333333
[2,] 0.033333333 0.13333333 0.16666667
[3,] 0.058333333 0.21333333 0.25000000
> out
[1] 0.9 0.6 0.5
> tmp2
[,1] [,2] [,3]
[1,] 0.008333333 0.05333333 0.08333333
[2,] 0.033333333 0.13333333 0.16666667
[3,] 0.058333333 0.21333333 0.25000000
> out2
[1] 0.9 0.6 0.5
Upvotes: 3