Reputation: 851

How to avoid double for-loop in matrix calculation

My function works, but it is very slow when I have large dataset.

What can I do to speed it up? I know we should avoid using double for-loop, but I don't know why.

Thanks so much!

    n <- 3
    wr <- c(0.9, 0.6, 0.5)
    mat <- matrix(1:9, nrow=3, byrow=TRUE)
    
    tmp    <- matrix(nrow = n, ncol = n)
    out   <- rep(0, n)
    
    colsum <- apply(mat, 2, sum)
    
    for (i in 1:n) {
      for (j in 1:n) {
        tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
      }
    }
    
    for (i in 1:n) {
      out[i] <- 1-sum(tmp[1:n,i])
    }

Upvotes: 2

Answers (3)

hello_friend

Reputation: 5798

Base R one-liner:

1-colSums(t(t(prop.table(mat, 2)) * (1 - wr)))

Or using sweep:

1-rowSums(t(sweep(mat, 2, "/", STATS = colSums(mat))) * (1 - wr))

Data:

n <- 3
wr <- c(0.9, 0.6, 0.5)
mat <- matrix(1:9, nrow=3, byrow=TRUE)

Upvotes: 1

kath

Reputation: 7734

Using apply can speed things up here:

colsum <- apply(mat, 2, sum)
1 - rowSums(apply(mat, 1, function(x) (x / colsum)*(1-wr)))

We can clearly see a difference in with the microbenchmark package and using a larger n:

n <- 1000
wr <- rep(c(0.9, 0.6, 0.5), length.out=n)
mat <- matrix(1:(n^2), nrow=n, byrow=TRUE)

tmp <- matrix(nrow = n, ncol = n)
out <- rep(0, n)

colsum <- apply(mat, 2, sum)
    
microbenchmark(
  for_loops = {
    for (i in 1:n) {
      for (j in 1:n) {
        tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
      }
    }
    
    for (i in 1:n) {
      out[i] <- 1-sum(tmp[1:n,i])
    }}, 
  apply = {
    out = 1 - rowSums(apply(mat, 1, function(x) (x / colsum)*(1-wr)))
  }, 
  transpose = {
    tmp = t(t(mat) / colsum * (1-wr))
    out = 1 - colSums(tmp)
  }, 
  rowSums = {
    1 - rowSums(t(mat) / colsum * (1-wr))
  }
)

Interestingly the transpose approach by @BellmanEqn seems to be faster than using apply, but using rowSums as proposed by @user20650 instead of a second transpose even tops that on average.

# Unit: milliseconds
#      expr      min        lq      mean   median        uq      max neval cld
# for_loops 198.6269 211.68075 246.55071 220.3864 239.66485 476.6462   100   c
#     apply  21.7299  23.98720  39.97067  29.9156  33.85995 232.0723   100  b 
# transpose  11.1222  11.66100  23.86154  13.6034  19.52560 271.2242   100 a  
#   rowSums   8.6790   9.32655  14.09392  10.0072  15.18220 171.8077   100 a

Upvotes: 2

BellmanEqn

Reputation: 799

Try this:

n <- 3
wr <- c(0.9, 0.6, 0.5)
mat <- matrix(1:9, nrow=3, byrow=TRUE)

tmp    <- matrix(nrow = n, ncol = n)
out   <- rep(0, n)

colsum <- apply(mat, 2, sum)

for (i in 1:n) {
  for (j in 1:n) {
    tmp[i, j] <- (mat[i, j]/ colsum[j])*(1-wr[j])
  }
}

for (i in 1:n) {
  out[i] <- 1-sum(tmp[1:n,i])
}

# alternatively:
tmp2 = t(t(mat) / colsum * (1-wr))
out2 = 1 - colSums(tmp)

> tmp
            [,1]       [,2]       [,3]
[1,] 0.008333333 0.05333333 0.08333333
[2,] 0.033333333 0.13333333 0.16666667
[3,] 0.058333333 0.21333333 0.25000000
> out
[1] 0.9 0.6 0.5

> tmp2
            [,1]       [,2]       [,3]
[1,] 0.008333333 0.05333333 0.08333333
[2,] 0.033333333 0.13333333 0.16666667
[3,] 0.058333333 0.21333333 0.25000000
> out2
[1] 0.9 0.6 0.5

Upvotes: 3

How to avoid double for-loop in matrix calculation

Answers (3)

Related Questions