Reputation: 400
I would like to get rid of the summation over $t$ and implement that in R. So far, I have this
Z_diff = mapply(FUN = function(i) {Z-Z[i]}, i=1:n, SIMPLIFY = TRUE)
rss = 0
for(t in 1:n){
rss = rss + t(Y - X %*% B[,t]) %*% (diag(c(Z_diff[,t])) %*% (Y - X %*% B[,t]))
}
Upvotes: 1
Views: 483
Reputation: 269734
Below we define a test input and compute the rss using the original formula in the question. Then we show a matrix formula that gives the same result. If you write out the formulas for the components of the matrix in the sum of the last line the reason for the result should be clear.
# input
set.seed(123)
n <- 4
Z <- rnorm(n)
Y <- rnorm(n)
B <- matrix(rnorm(n*n), n)
X <- matrix(rnorm(n*n), n)
# scalar sum
rss <- 0
for(i in 1:n)
for(t in 1:n)
rss <- rss + (Y[i] - c( crossprod(X[i, ], B[, t])))^2 * (Z[t] - Z[i])
rss
## [1] 43.39269
# matrix sum
ZZ <- t(outer(Z, Z, "-"))
sum(ZZ * (Y - X%*%B)^2)
## [1] 43.39269
With the data above the matrix formulation runs about 300x faster on my machine.
library(microbenchmark)
microbenchmark(
scalar = {
rss <- 0
for(i in 1:n)
for(t in 1:n)
rss <- rss + (Y[i] - c( crossprod(X[i, ], B[, t])))^2 * (Z[t] - Z[i])
rss
},
matrix = {
ZZ <- t(outer(Z, Z, "-"))
sum(ZZ * (Y - X%*%B)^2)
})
+ })
## Unit: microseconds
## expr min lq mean median uq max neval cld
## scalar 34428.2 36966.35 44244.423 38257.80 40968.6 407679.6 100 b
## matrix 80.8 82.45 148.764 156.45 177.8 721.7 100 a
Upvotes: 3