Reputation: 5691
There is a function in the exactci
package that I'd like to pass arguments to as matrices and get back a matrix. As it is, all arguments can only be vectors of length 1. I dug into the source and found this piece, the function I actually use (here with arguments modified and reduced):
exact.binom.minlike <- function(d1, d2, e1, e2){
x <- round(d1)
n <- x + round(d2)
p <- e1 / (e1 + e2)
support <- 0:n
f <- dbinom(support, n, p)
d <- f[support == x]
sum(f[f <= d * relErr])
}
(this returns a p value for a two-sided test of equality for poisson rates using the minlike
method)
I see that the reason I can't pass in a matrix and get back a matrix is because of the vector support
that gets created inside. I stripped down the dbinom()
part to the following:
f <- exp( lfactorial(n) -
(lfactorial(support) + lfactorial(n - support)) +
support * log(p) +
(n - support) * log(1 - p)
)
This gives back the same vector, f
, fine and dandy, even a bit faster, but it doesn't appear to solve my problem- at least I don't see a way out of using support
as a vector. The length of support will vary based on whatever d1+d2
is, so I'm stuck making comparisons one at a time. The best I've been able to do is stick the whole thing inside Vectorize()
, which takes matrices just fine as argument, but returns back a vector instead of a matrix:
exact.binom.minlike.stripped <- Vectorize(compiler:::cmpfun(function(d1, d2, e1, e2, relErr = 1 + 10 ^ ( -7)){
x <- round(d1)
n <- x + round(d2)
p <- e1 / (e1 + e2)
support <- 0:n
# where dbinom() is the prob mass function:
# n choose k * p ^ k * (1 - p) ^ (n - k) # log it to strip down, then exp it
f <- exp( lfactorial(n) -
(lfactorial(support) + lfactorial(n - support)) +
support * log(p) +
(n - support) * log(1 - p)
)
#f <- dbinom(support,n,p)
d <- f[support == x]
sum(f[f <= d * relErr])
}))
Here's an example:
set.seed(1)
d1 <- matrix(rpois(36,lambda = 100), 6)
d2 <- matrix(rpois(36,lambda = 150), 6)
e1 <- matrix(rpois(36,lambda = 10000), 6)
e2 <- matrix(rpois(36,lambda = 25000), 6)
this output is a vector of length 36 instead of a 6x6 matrix. All four inputs were 6x6 matrices:
(p.vals <- exact.binom.minlike.stripped(d1, d2, e1, e2))
[1] 1.935277e-04 9.680425e-08 1.508232e-08 1.227176e-04 1.656111e-02
[6] 2.310620e-04 2.871150e-05 4.024025e-06 4.804943e-05 1.619866e-02
[11] 3.610596e-02 1.101247e-04 5.153746e-04 1.350891e-04 8.663191e-06
[16] 1.384378e-05 2.681715e-06 4.556092e-08 2.270317e-04 2.040001e-04
[21] 3.330344e-01 4.775055e-05 2.588667e-07 5.647732e-04 1.615861e-03
[26] 2.438345e-03 2.524692e-04 3.398664e-05 2.001322e-05 4.361194e-03
[31] 3.909116e-05 1.697943e-03 8.543677e-07 2.992653e-05 2.617216e-04
[36] 3.106748e-03
I gather I can add dim()
s to this and make it back into a matrix:
dim(p.vals) <- dim(d1)
but that seems second best. Can I make Vectorize()
give back a matrix of the same dimensions as the arguments passed to it? Even better, is there a way to properly vectorize what I'm doing here and avoid hidden for loops altogether (Vectorize()
uses mapply()
)?
[[Edit]] Thanks Pete for the great suggestions. Here's a comparison using data closer in dimension to what I'm actually doing:
set.seed(1)
N <-110
d1 <- matrix(rpois(N^2,lambda = 1000), N)
d2 <- matrix(rpois(N^2,lambda = 1500), N)
e1 <- matrix(rpois(N^2,lambda = 10000), N)
e2 <- matrix(rpois(N^2,lambda = 25000), N)
system.time(exact.binom.minlike.stripped.2(d1, d2, e1, e2))
user system elapsed
16.353 1.112 17.635
system.time(exact.binom.minlike.stripped.3(d1, d2, e1, e2))
user system elapsed
14.685 0.016 14.715
system.time({
(p.vals <- exact.binom.minlike.stripped(d1, d2, e1, e2))
(dim(p.vals) <- dim(d1))
})
user system elapsed
12.541 0.040 12.604
I watched my system monitor for memory usage during these, and only exact.binom.minlike.stripped.2()
is a memory hog. I see that if I were to use this on my real data, where max(n)
can get 10-20 times larger, that my computer would choke. (3) does not avthis problem, but for some reason it's not quite as fast as exact.binom.minlike.stripped()
. Compiling (3) did not make it run any faster on my system.
[[Edit 2]]: on the same data, Pete's new exact.binom.minlike.stripped3()
does the job in:
user system elapsed
6.468 0.032 6.513
Thus, the later stretegy, pre-calculating the log factorial of max(n)
, is a major time-saver. Many thanks Pete!
Upvotes: 3
Views: 140
Reputation: 2396
I can think of two reasons for wanting a function like this vectorised: convenience or performance.
The following should work for convenience, but I suspect that if max(n)
is very large then all the memory allocation will offset any gains from the vectorisation of the dbinom
call.
exact.binom.minlike.stripped.2 <- function(d1, d2, e1, e2, relErr = 1 + 1e-7) {
x <- round(d1)
n <- x + round(d2)
p <- e1 / (e1 + e2)
# `binom` is already vectorised.
d <- dbinom(x, n, p)
# rearrange inputs to `dbinom` so that it works with `outer`.
dbinom.rearrange <- function(n, x, p) dbinom(x, n, p)
support <- 0:max(n)
f <- outer(n, support, dbinom.rearrange, p=p)
# repeat `d` enough times to conform with `f`.
d <- array(d, dim(f))
f[f > d * relErr] <- 0
# extract the required sums.
apply(f, c(1,2), sum)
}
Or, a possibly more sensible way to do it: use natural vectorisation as far as that will go, and limit Vectorize
to the "unnatural" part. This still requires repairing the dimensions at the end.
vector.f <- Vectorize(function(d, n, p, ftable) {
x <- 0:n
f <- exp( ftable[n+1] - (ftable[x+1] + ftable[n-x+1]) + x*log(p) + (n-x)*log(1-p) )
sum(f[f <= d])
}, c('d', 'n', 'p'))
exact.binom.minlike.stripped.3 <- function(d1, d2, e1, e2, relErr = 1 + 1e-7) {
x <- round(d1)
n <- x + round(d2)
p <- e1 / (e1 + e2)
# `binom` is already vectorised.
d <- dbinom(x, n, p)
# precompute factorials
ftable <- lfactorial(0:max(n))
f <- vector.f(d * relErr, n, p, ftable)
dim(f) <- dim(d1)
return(f)
}
Both of these come out about the same speed on my laptop for your example, although one or the other may be faster depending on the actual size of your problem and your hardware.
Upvotes: 1