John
John

Reputation: 23758

Vectorizing code and stuck but good

Here are some sample starting values for variables in the code below.

sd <- 2
sdtheory <- 1.5
meanoftheory <- 0.6
obtained <- 0.8
tails <- 2

I'm trying to vectorize the following code. It is a component of a Bayes factor calculator that was originally written by Dienes and adapted to R by Danny Kaye & Thom Baguley. This part is for calculating the likelihood for the theory. I've got the thing massively sped up by vectorizing but I can't match output of the bit below.

area <- 0
theta <- meanoftheory - 5 * sdtheory
incr <- sdtheory / 200
for (A in -1000:1000){
    theta <- theta + incr
    dist_theta <- dnorm(theta, meanoftheory, sdtheory)
    if(identical(tails, 1)){
            if (theta <= 0){
                dist_theta <- 0
            } else {
                dist_theta <- dist_theta * 2
            }
        }
    height <- dist_theta * dnorm(obtained, theta, sd)
    area <- area + height * incr
}
area

And below is the vectorized version.

incr <- sdtheory / 200
newLower <- meanoftheory - 5 * sdtheory + incr
theta <- seq(newLower, by = incr, length.out = 2001)
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if (tails == 1){
    dist_theta <- dist_theta[theta > 0] * 2
    theta <- theta[theta > 0]   
    }
height <- dist_theta * dnorm(obtained, theta, sd)
area <- sum(height * incr)
area

This code exactly copies the results of the original if tails <- 2. Everything I've got here so far should just copy and paste and give the exact same results. However, once tails <- 1 the second function no longer matches exactly. But as near as I can tell I'm doing the equivalent in the new if statement to what is happening in the original. Any help would be appreciated.

(I did try to create a more minimal example, stripping it down to just he loop and if statements and a tiny amount of slices and I just couldn't get the code to fail.)

Upvotes: 1

Views: 347

Answers (2)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37764

The original calculation has an error due to floating point arithmetic; adding incr each time causes theta to actually equal 7.204654e-14 when it should equal zero. So it's not actually doing the right thing on that pass through the loop; it's not doing the <= code when it should be. Your code is (at least, it did with these starting values on my machine).

Your code isn't necessarily guaranteed to do the right thing every time either; what seq does is better than adding an increment over and over again, but it's still floating point arithmetic. You really should probably be checking to within machine tolerance of zero, perhaps using all.equal or something similar.

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176668

You're dropping observations where theta==0. That's a problem because the output of dnorm is not zero when theta==0. You need those observations in your output.

Rather than drop observations, a better solution would be to set those elements to zero.

incr <- sdtheory / 200
newLower <- meanoftheory - 5 * sdtheory + incr
theta <- seq(newLower, by = incr, length.out = 2001)
dist_theta <- dnorm(theta, meanoftheory, sdtheory)
if (tails == 1){
    dist_theta <- ifelse(theta < 0, 0, dist_theta) * 2
    theta[theta < 0] <- 0
    }
height <- dist_theta * dnorm(obtained, theta, sd)
area <- sum(height * incr)
area

Upvotes: 3

Related Questions