Ben S.
Ben S.

Reputation: 167

Incomplete gamma functions: can this code get any faster in cython, C, or Fortran?

As part of a large piece of code, I need to calculate arrays of incomplete gamma functions. For example, I need a function that returns (the log of) (gamma(z + m, a, inf)/m!) for m in [0, m_max], for various values of m_max (typically around 400), z, and a. I need to do this quickly. Currently, this step is the the slowest in my code by around a factor of ~2. However, the full code takes ~a day to run, so reducing the computation time of this step by 2 would save me a lot of wall time.

I am using the following cython code for the calculation:

import numpy as np
cimport numpy as np
from mpmath import mp

sp_max = 5000 

def log_factorial(k):
    return np.sum(np.log(np.arange(1., k + 1., dtype=np.float)))

log_factorial_ary = np.vectorize(log_factorial)(np.arange(sp_max))

gamma_mem = mp.memoize(mp.gamma)
gammainc_mem = mp.memoize(mp.gammainc)

def gammainc_up_fct_ary_log(np.int m_max, np.float z, np.float a):
    cdef np.ndarray gi_list = np.zeros(m_max + 1, dtype=np.float)    
    gi_list[0] = np.float(gammainc_mem(z, a))
    cdef np.ndarray i_array = np.arange(1., m_max + 1., dtype=np.float)
    cdef Py_ssize_t i 
    for i in np.arange(1, m_max + 1):
        gi_list[i] = (i_array[i-1] - 1. + z)*gi_list[i-1]/i + np.exp((i_array[i-1] - 1. + z)*np.log(a) - a - log_factorial_ary[i])
    return gi_list

As an example, when I call gammainc_up_fct_ary_log(400,-0.3,10.0) it takes around ~0.015-0.025 seconds. I would like to speed this up by at least a factor of 2 (or, ideally, as fast as possible).

Is there a clear way to speed up this computation using cython? If not, would C or Fortran be significantly faster? If so, what is the fastest way to write this function in that language and then call the code from python (the rest of my code is written in python/cython).

Thanks in advance.

Upvotes: 2

Views: 717

Answers (1)

serge-sans-paille
serge-sans-paille

Reputation: 2139

There are several big issues in your cython version:

  1. i_array is useless, you can safely replace i_array[i-1] by just i

  2. You're not getting the most of cython. If you have a look to the output of cython -a on your code, you'll see that cython is just generating calls to the C-API, while you need calls to C code to have it run fast.

Here is an example of what you could achieve (incomplete, but the speedup is already great)

import numpy as np
cimport numpy as np
cimport cython
from mpmath import mp

cdef extern from "math.h":
    double log(double x) nogil
    double exp(double x) nogil

sp_max = 5000 

def log_factorial(k):
    return np.sum(np.log(np.arange(1., k + 1., dtype=np.float)))

factorial_ary = np.array([np.float(mp.factorial(m)) for m in np.arange(sp_max)])
log_factorial_ary = np.vectorize(log_factorial)(np.arange(sp_max))

gamma_mem = mp.memoize(mp.gamma)
gammainc_mem = mp.memoize(mp.gammainc)

def gammainc_up_fct_ary_log(m_max, z, a):
    return gammainc_up_fct_ary_log_impl(m_max, z, a)

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cdef gammainc_up_fct_ary_log_impl(int m_max, double z, double a):
    cdef double[::1] gi_list = np.zeros(m_max + 1, dtype=np.float)
    gi_list[0] = gammainc_mem(z, a)
    cdef Py_ssize_t i
    for i in range(1, m_max + 1):
        t0 = (i - 1. + z)
        t1 = (i - 1. + z)*log(a) - a
        gi_list[i] = t0*gi_list[i-1]/i + exp(t1 - log_factorial_ary[i])
    return gi_list

running this code gives me:

python -m timeit -s 'from ff import gammainc_up_fct_ary_log' 'gammainc_up_fct_ary_log(400,-0.3,10.0)'

10000 loops, best of 3: 132 usec per loop

while your version hardly gives:

python -m timeit -s 'from ff import gammainc_up_fct_ary_log' 'gammainc_up_fct_ary_log(400,-0.3,10.0)'

100 loops, best of 3: 2.44 msec per loop

Upvotes: 2

Related Questions