Reputation: 764
For the following two matrices, I would like to find the mean for each column. It is easy to find for a small number of row and a small number of column.
yy=matrix(c(1:40), nrow = 10, ncol = 4)
tt=c(1:8)
yy_new=matrix(NA, nrow = 10, ncol=length(tt))
yy_new1=matrix(NA, nrow = 10, ncol=length(tt))
dim(yy_new)
for ( it in 1:10){
for ( tim in 1:8){
yy_new[it, tim]=yy[it,1]+yy[it,3]*tt[tim]
yy_new1[it, tim]=yy[it,2]+yy[it,4]*tt[tim]+2
}
}
yy_new_mean=apply(yy_new,2,mean) #column wise mean of the first matrix
yy_new1_mean=apply(yy_new1,2,mean)
If the number of column and rows are very large say 10000 rows and 2,000 columns, It is taking too much time to create the matrix which is in the inside loop (yy_new and yy_new1)
. Can I do do it efficiently so that the computation will not take a long time?
Upvotes: 0
Views: 124
Reputation: 490
You can use the function outer to create matrices of the results you want:
yy_new <- outer(1:10, 1:8, function(x,y){
yy[x,1]+yy[x,3]*tt[y]
})
yy_new1 <- outer(1:10, 1:8, function(x,y){
yy[x,2]+yy[x,4]*tt[y]+2
})
That's much faster than a for
loop. In general in R you want to avoid for
loops, as most functions are vectorized.
Comparing both options using microbenchmark
, it's about 100 times faster:
min lq mean median uq max neval
6207.115 6601.342 7691.66462 6868.801 7215.776 45110.99 100
27.152 30.855 50.98553 56.066 61.532 195.35 100
Upvotes: 1