Reputation: 764
For the following two matrix,
yy=matrix(c(1:40), nrow = 10, ncol = 8)
tt=diag(1:4)
I would like to create a new matrix yy_new=matrix(NA, nrow = 10, ncol=ncol(tt))
by multiplying each row and first 4 column
of yy
by tt
. Example, for the first row yy_new=yy[1,1:4]%*%tt
, second row is
yy_new=yy[2,1:4]%*%tt
. Finally I want the mean of yy_new
at each column as yy_new=apply(yy_new,2,mean)
. The folowing loop is working well, but for large data set it is time comsuming.
yy_new=matrix(NA, nrow = 10, ncol=ncol(tt))
for ( it in 1:10){
for ( tim in 1:4){
yy_new[it, tim]=yy[it,tim]*tt[tim,tim]
}
}
yy_new=apply(yy_new,2,mean)
similarly, I want another matrix yy_new1
by considering the last four column of yy
yy_new1=matrix(NA, nrow = 10, ncol=ncol(tt))
How can I do it efficiently using any built-in function or customized function? Any help is appreciated.
Upvotes: 1
Views: 658
Reputation: 50668
Here is a shorter (and faster) version for yy_new
yy_new <- rowMeans(apply(yy[, 1:4], 1, function(row) row %*% tt))
Similarly for the last 4 columns of yy
yy_new1 <- rowMeans(apply(yy[, (ncol(yy)-3):ncol(yy)], 1, function(row) row %*% tt))
Note that rowMeans
and colMeans
are generally faster than apply(..., 1, mean)
and apply(..., 2, mean)
.
Here are results from a microbenchmark
comparison
library(microbenchmark)
res <- microbenchmark(
rowMeans_apply = {
yy_new = rowMeans(apply(yy[, 1:4], 1, function(row) row %*% tt))
},
for_loop = {
yy_new=matrix(NA, nrow = 10, ncol=ncol(tt))
for ( it in 1:10){
for ( tim in 1:4){
yy_new[it, tim]=yy[it,tim]*tt[tim,tim]
}
}
}
)
res
#Unit: microseconds
# expr min lq mean median uq max neval
# rowMeans_apply 73.148 82.097 116.8959 101.329 123.863 1348.141 100
# for_loop 3985.521 4141.633 5017.9808 4421.285 5020.425 18574.364 100
In response to your comment, you could do something like this:
f <- function(x) rowMeans(apply(x, 1, function(row) row %*% tt))
sapply(split.default(as.data.frame(yy), rep(1:2, each = 4)), f)
# 1 2
#[1,] 5.5 5.5
#[2,] 31.0 31.0
#[3,] 76.5 76.5
#[4,] 142.0 142.0
Explanation: split.default
here splits the data.frame
into the first 4 and last 4 columns and stores them as two data.frame
s in a list
; then we use sapply
to loop through the list
elements and calculate the required quantity as requested. The resulting output object is a matrix
.
Upvotes: 1