tim riffe
tim riffe

Reputation: 5691

make Vectorize() pass dims OR properly vectorize this function

There is a function in the exactci package that I'd like to pass arguments to as matrices and get back a matrix. As it is, all arguments can only be vectors of length 1. I dug into the source and found this piece, the function I actually use (here with arguments modified and reduced):

exact.binom.minlike <- function(d1, d2, e1, e2){
    x           <- round(d1)
    n           <- x + round(d2)
    p           <- e1 / (e1 + e2)

    support     <- 0:n
    f           <- dbinom(support, n, p)
    d           <- f[support == x]

    sum(f[f <= d * relErr])
}

(this returns a p value for a two-sided test of equality for poisson rates using the minlike method)

I see that the reason I can't pass in a matrix and get back a matrix is because of the vector support that gets created inside. I stripped down the dbinom() part to the following:

f           <- exp( lfactorial(n) - 
                    (lfactorial(support) + lfactorial(n - support)) + 
                    support * log(p) + 
                    (n - support) * log(1 - p)
                   )

This gives back the same vector, f, fine and dandy, even a bit faster, but it doesn't appear to solve my problem- at least I don't see a way out of using support as a vector. The length of support will vary based on whatever d1+d2 is, so I'm stuck making comparisons one at a time. The best I've been able to do is stick the whole thing inside Vectorize(), which takes matrices just fine as argument, but returns back a vector instead of a matrix:

exact.binom.minlike.stripped <- Vectorize(compiler:::cmpfun(function(d1, d2, e1, e2, relErr = 1 + 10 ^ ( -7)){
    x           <- round(d1)
    n           <- x + round(d2)
    p           <- e1 / (e1 + e2)

    support     <- 0:n

    # where dbinom() is the prob mass function:
    # n choose k * p ^ k * (1 - p) ^ (n - k) # log it to strip down, then exp it
    f           <- exp( lfactorial(n) - 
                        (lfactorial(support) + lfactorial(n - support)) + 
                        support * log(p) + 
                        (n - support) * log(1 - p)
                       )
   #f           <- dbinom(support,n,p)
   d            <- f[support == x]

   sum(f[f <= d * relErr])
}))

Here's an example:

set.seed(1)
d1 <- matrix(rpois(36,lambda = 100), 6)
d2 <- matrix(rpois(36,lambda = 150), 6)
e1 <- matrix(rpois(36,lambda = 10000), 6)
e2 <- matrix(rpois(36,lambda = 25000), 6)

this output is a vector of length 36 instead of a 6x6 matrix. All four inputs were 6x6 matrices:

(p.vals <- exact.binom.minlike.stripped(d1, d2, e1, e2))
 [1] 1.935277e-04 9.680425e-08 1.508232e-08 1.227176e-04 1.656111e-02
 [6] 2.310620e-04 2.871150e-05 4.024025e-06 4.804943e-05 1.619866e-02
[11] 3.610596e-02 1.101247e-04 5.153746e-04 1.350891e-04 8.663191e-06
[16] 1.384378e-05 2.681715e-06 4.556092e-08 2.270317e-04 2.040001e-04
[21] 3.330344e-01 4.775055e-05 2.588667e-07 5.647732e-04 1.615861e-03
[26] 2.438345e-03 2.524692e-04 3.398664e-05 2.001322e-05 4.361194e-03
[31] 3.909116e-05 1.697943e-03 8.543677e-07 2.992653e-05 2.617216e-04
[36] 3.106748e-03

I gather I can add dim()s to this and make it back into a matrix:

dim(p.vals) <- dim(d1)

but that seems second best. Can I make Vectorize() give back a matrix of the same dimensions as the arguments passed to it? Even better, is there a way to properly vectorize what I'm doing here and avoid hidden for loops altogether (Vectorize() uses mapply())?

[[Edit]] Thanks Pete for the great suggestions. Here's a comparison using data closer in dimension to what I'm actually doing:

set.seed(1)
N  <-110
d1 <- matrix(rpois(N^2,lambda = 1000), N)
d2 <- matrix(rpois(N^2,lambda = 1500), N)
e1 <- matrix(rpois(N^2,lambda = 10000), N)
e2 <- matrix(rpois(N^2,lambda = 25000), N)

system.time(exact.binom.minlike.stripped.2(d1, d2, e1, e2))
   user  system elapsed 
 16.353   1.112  17.635
system.time(exact.binom.minlike.stripped.3(d1, d2, e1, e2))
   user  system elapsed 
 14.685   0.016  14.715 
system.time({
        (p.vals <- exact.binom.minlike.stripped(d1, d2, e1, e2))
        (dim(p.vals) <- dim(d1))
    })
   user  system elapsed 
 12.541   0.040  12.604 

I watched my system monitor for memory usage during these, and only exact.binom.minlike.stripped.2() is a memory hog. I see that if I were to use this on my real data, where max(n) can get 10-20 times larger, that my computer would choke. (3) does not avthis problem, but for some reason it's not quite as fast as exact.binom.minlike.stripped(). Compiling (3) did not make it run any faster on my system.

[[Edit 2]]: on the same data, Pete's new exact.binom.minlike.stripped3() does the job in:

   user  system elapsed 
  6.468   0.032   6.513 

Thus, the later stretegy, pre-calculating the log factorial of max(n), is a major time-saver. Many thanks Pete!

Upvotes: 3

Views: 140

Answers (1)

pete
pete

Reputation: 2396

I can think of two reasons for wanting a function like this vectorised: convenience or performance.

The following should work for convenience, but I suspect that if max(n) is very large then all the memory allocation will offset any gains from the vectorisation of the dbinom call.

exact.binom.minlike.stripped.2 <- function(d1, d2, e1, e2, relErr = 1 + 1e-7) {

    x <- round(d1)
    n <- x + round(d2)
    p <- e1 / (e1 + e2)

    # `binom` is already vectorised.
    d <- dbinom(x, n, p)

    # rearrange inputs to `dbinom` so that it works with `outer`.
    dbinom.rearrange <- function(n, x, p) dbinom(x, n, p) 
    support <- 0:max(n)
    f <- outer(n, support, dbinom.rearrange, p=p)

    # repeat `d` enough times to conform with `f`.
    d <- array(d, dim(f))
    f[f > d * relErr] <- 0

    # extract the required sums.
    apply(f, c(1,2), sum) 
}

Or, a possibly more sensible way to do it: use natural vectorisation as far as that will go, and limit Vectorize to the "unnatural" part. This still requires repairing the dimensions at the end.

vector.f <- Vectorize(function(d, n, p, ftable) {

    x <- 0:n
    f <- exp( ftable[n+1] - (ftable[x+1] + ftable[n-x+1]) + x*log(p) + (n-x)*log(1-p) )
    sum(f[f <= d])

}, c('d', 'n', 'p'))

exact.binom.minlike.stripped.3 <- function(d1, d2, e1, e2, relErr = 1 + 1e-7) {

    x <- round(d1)
    n <- x + round(d2)
    p <- e1 / (e1 + e2)

    # `binom` is already vectorised.
    d <- dbinom(x, n, p)

    # precompute factorials
    ftable <- lfactorial(0:max(n))

    f <- vector.f(d * relErr, n, p, ftable)
    dim(f) <- dim(d1)

    return(f)
}

Both of these come out about the same speed on my laptop for your example, although one or the other may be faster depending on the actual size of your problem and your hardware.

Upvotes: 1

Related Questions