antimuon
antimuon

Reputation: 262

Nested for loop using apply function

I was wondering if there is a way to do the below matrix population using the apply function? I have read that apply is more efficient than for loops. For some reason I struggle with the apply function family.

I am looking to populate a matrix. I am using data in a data frame as conditions of the data population.

Below is a sample of the data logic; it is simplified but it covers the essentials of the population being rule based.

id_1 <- c(1, 1, 1, 1, 1, 1)
id_2 <- c(1, 1, 2, 2, 3, 3)
id_3 <- c(1, 2, 2, 3, 3, 4)
amt <- c(10, 15, 20, 25, 30, 35)

sample_data <- data.frame(id_1, id_2, id_3, amt)

n <- length(sample_data)

cor <- matrix(ncol = n, nrow = n)

i <- 1
j <- 1

for (i in 1:n) {
  for (j in 1:n) {
    if (i == j) {
      cor[i,j] = 1
    } else if (sample_data[2][i,] == sample_data[2][j,] & sample_data[3][i,] != sample_data[3][j,]) {
      cor[i,j] = 0
    } else if (sample_data[2][i,] != sample_data[2][j,] & sample_data[3][i,] == sample_data[3][j,]) {
      cor[i,j] = 0.5
    } else {
      cor[i,j] = 0.25
    }
  }
}

cor

     [,1] [,2] [,3] [,4]
[1,] 1.00 0.00 0.25 0.25
[2,] 0.00 1.00 0.50 0.25
[3,] 0.25 0.50 1.00 0.00
[4,] 0.25 0.25 0.00 1.00

Upvotes: 0

Views: 119

Answers (1)

josliber
josliber

Reputation: 44340

apply is not more efficient than a for loop, so that's not a good approach if you are looking for efficiency. Instead, you should use vectorized operations. Let's take apart your for loop:

First, elements take value 1 if they are on the diagonal, which can be achieved with the diag function:

diag(n)
#      [,1] [,2] [,3] [,4]
# [1,]    1    0    0    0
# [2,]    0    1    0    0
# [3,]    0    0    1    0
# [4,]    0    0    0    1

Off-diagonal entries (i, j) take value 0.5 if the entry i and j in the second column of sample_data don't match and if entries i and j in the third column of sample_data match. This can be achieved with the vectorized outer function:

topn.2 <- head(sample_data[,2], n)
topn.3 <- head(sample_data[,3], n)
0.5 * (outer(topn.2, topn.2, "!=") & outer(topn.3, topn.3, "=="))
#      [,1] [,2] [,3] [,4]
# [1,]    0  0.0  0.0    0
# [2,]    0  0.0  0.5    0
# [3,]    0  0.5  0.0    0
# [4,]    0  0.0  0.0    0

Off-diagonal entries (i, j) take value 0.25 if entry i and j match in either both columns 2 and 3 or in neither. Again, this can be achieved with outer:

0.25 * (outer(1:n, 1:n, "!=") & (outer(topn.2, topn.2, "==") + outer(topn.3, topn.3, "==")) != 1)
#      [,1] [,2] [,3] [,4]
# [1,] 0.00 0.00 0.25 0.25
# [2,] 0.00 0.00 0.00 0.25
# [3,] 0.25 0.00 0.00 0.00
# [4,] 0.25 0.25 0.00 0.00

Adding everything together yields a fully vectorized replacement to the for loop:

diag(n) +
  0.5 * (outer(topn.2, topn.2, "!=") & outer(topn.3, topn.3, "==")) +
  0.25 * (outer(1:n, 1:n, "!=") & (outer(topn.2, topn.2, "==") + outer(topn.3, topn.3, "==")) != 1)
#      [,1] [,2] [,3] [,4]
# [1,] 1.00 0.00 0.25 0.25
# [2,] 0.00 1.00 0.50 0.25
# [3,] 0.25 0.50 1.00 0.00
# [4,] 0.25 0.25 0.00 1.00

Upvotes: 3

Related Questions