Reputation: 85
I have an R function to compute the probability of the integer k; k = 1, ....., m from the given values of the other parameters. When the size of the k is very large (e.g., m = 10,000), the function is very slow. Do you have any suggestions to improve the performance of the function?
I also want to create an equivalent function in C++ if needed so that I can use RCPP package from R, but I do not know C++. Before learning C++ from the scratch, I also would like to have your suggestions.
prob <- function(k, et, ey, nrep = 10000, m0, m1)
{
m = m0 + m1
t <- rnorm(nrep, et, 1)
p0 <- pnorm(-t)
p1 <- pnorm(ey - t)
mean0 <- (m0 - 1)*p0 + m1*p1 + 1
mean1 <- m0*p0 + (m1 - 1)*p1 + 1
var0 <- (m0 - 1)*p0*(1 - p0) + m1*p1*(1 - p1)
var1 <- m0*p0*(1 - p0) + (m1 - 1)*p1*(1 - p1)
prob <- ifelse(et == 0, mean(dnorm(k, mean0, sqrt(var0))),
mean(dnorm(k, mean1, sqrt(var1))))
return(prob)
}
apply the function
prob_k <- sapply(1:10000, prob, et=1, ey=1 ,m0 = 5000, m1 = 5000)
Upvotes: 0
Views: 148
Reputation: 7685
Since dnorm
is a vectorized function, you can just call prob
like
prob_k <- prob(1:10000, et = 1, ey = 1 ,m0 = 5000, m1 = 5000)
To get verctorized outputs, you will have to change the code of the function a bit though
prob <- function(k, et, ey, nrep = 10000, m0, m1)
{
m = m0 + m1
t <- rnorm(nrep, et, 1)
p0 <- pnorm(-t)
p1 <- pnorm(ey - t)
mean0 <- (m0 - 1)*p0 + m1*p1 + 1
mean1 <- m0*p0 + (m1 - 1)*p1 + 1
var0 <- (m0 - 1)*p0*(1 - p0) + m1*p1*(1 - p1)
var1 <- m0*p0*(1 - p0) + (m1 - 1)*p1*(1 - p1)
if( et == 0 )
dnorm(k, mean0, sqrt(var0))
else
dnorm(k, mean1, sqrt(var1))
}
This is pretty fast on my machine (5 milliseconds on average)
microbenchmark(prob_k <- prob(1:10000, et = 1, ey = 1 ,m0 = 5000, m1 = 5000))
# Unit: milliseconds
# expr min
# prob_k <- prob(1:10000, et = 1, ey = 1, m0 = 5000, m1 = 5000) 4.68232
# lq mean median uq max neval
# 4.776912 5.168405 4.817979 4.907612 7.023989 100
Upvotes: 1