Reputation: 1043
I am maximizing a likelihood function and trying to reduce the loop.
I want to add the vector(parameters to be estimated) to all rows of a matrix (data). The length of vector equals to the column of matrix.
a+b
would give the wrong results because the recycle rule of R is by column not row.
a<-c(1,2,0,0,0) # parameters to be optimized
b<-matrix(1,ncol=5,nrow=6) # data
t(a+t(b)) # my code would work, anything more intuitive?
Desired output
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 1 1 1
[2,] 2 3 1 1 1
[3,] 2 3 1 1 1
[4,] 2 3 1 1 1
[5,] 2 3 1 1 1
[6,] 2 3 1 1 1
The wrong output
a+b
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 1 1 1
[2,] 3 1 1 1 2
[3,] 1 1 1 2 3
[4,] 1 1 2 3 1
[5,] 1 2 3 1 1
[6,] 2 3 1 1 1
Upvotes: 13
Views: 12848
Reputation: 8863
Another benchmark:
es=2:7
r=sapply(es,function(e){
m=matrix(rnorm(10*10^e),ncol=10)
v=rnorm(10)
b=microbenchmark(times=10,
t(t(m)+v),
v[col(m)]+m,
m+rep(v,each=nrow(m)),
sweep(m,2,v,"+"),
m+outer(rep(1,nrow(m)),v),
collapse::TRA(m,v,"+"),
Rfast::eachrow(m,v,"+")
)
a=aggregate(b$time,list(b$expr),median)
setNames(a[,2],gsub(" ","",a[,1]))/1e6
})
r2=apply(r,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
r3=apply(rbind(paste0("1e",es),r2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(r3,c("",rownames(r))),1,paste,collapse=" "))
Output (median time in ms, 1e7 means 1e7 rows):
1e2 1e3 1e4 1e5 1e6 1e7
0.0102 0.116 1.10 13.4 96 1478 t(t(m)+v)
0.0053 0.105 0.98 9.3 65 1225 v[col(m)]+m
0.0182 0.197 1.93 19.7 173 2044 m+rep(v,each=nrow(m))
0.0397 0.151 1.16 12.9 93 1431 sweep(m,2,v,"+")
0.0088 0.053 0.46 3.1 41 610 m+outer(rep(1,nrow(m)),v)
0.0049 0.037 0.32 3.1 16 402 collapse::TRA(m,v,"+")
0.0043 0.036 0.31 3.3 13 382 Rfast::eachrow(m,v,"+")
Upvotes: 1
Reputation: 1369
These solutions using outer()
or collapse::TRA()
are is significantly faster than using rep
or sweep
.
Upvotes: 3
Reputation: 887108
We can use col
to replicate the 'a' elements
b + a[col(b)]
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 3 1 1 1
#[2,] 2 3 1 1 1
#[3,] 2 3 1 1 1
#[4,] 2 3 1 1 1
#[5,] 2 3 1 1 1
#[6,] 2 3 1 1 1
Or a faster option would be to use rep
b + rep(a, each = nrow(b))
Or use sweep
sweep(b, 2, a, "+")
set.seed(24)
b <- matrix(sample(0:9, 8000*5000, replace=TRUE), ncol=5000)
a <- sample(0:3, 5000, replace=TRUE)
system.time(b + a[col(b)])
# user system elapsed
# 1.08 0.06 1.14
system.time(b + rep(a, each = nrow(b)))
# user system elapsed
# 0.83 0.03 0.86
system.time(t(a+t(b)))
# user system elapsed
# 1.14 0.03 1.17
system.time(sweep(b, 2, a, "+"))
# user system elapsed
# 0.62 0.06 0.69
Upvotes: 18