joon
joon

Reputation: 4017

Possible optimization in cython: numpy array

The following is my Cython code for drawing from multivariate normal distribution. I am using loop because each time I have different density. (conLSigma is the Cholesky factor)

This is taking a lot of time because I am taking inverse and Cholesky decomposition for each loop. It is faster than pure python code, but I was wondering if there is any way I can boost the speed more.

from __future__ import division

import numpy as np 

cimport numpy as np 

ctypedef np.float64_t dtype_t

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)

def drawMetro(np.ndarray[dtype_t, ndim = 2] beta,
              np.ndarray[dtype_t, ndim = 3] H,
              np.ndarray[dtype_t, ndim = 2] Sigma,
              float s):

    cdef int ncons = betas.shape[0]
    cdef int nX = betas.shape[1]
    cdef int con

    cdef np.ndarray betas_cand = np.zeros([ncons, nX], dtype = np.float64)
    cdef np.ndarray conLSigma = np.zeros([nX, nX], dtype = np.float64)

    for con in xrange(ncons):
        conLSigma = np.linalg.cholesky(np.linalg.inv(H[con] + Sigma))
        betas_cand[con] = betas[con] + s * np.dot(conLSigma, np.random.standard_normal(size = nX))

    return(betas_cand)

Upvotes: 3

Views: 1659

Answers (2)

Axel
Axel

Reputation: 1

what about computing the cholesky decomposition first and inverting the lower triangular matrix after by back substitution. This should be faster than linalg.cholesky(linalg.inv(S)).

Upvotes: 0

Justin Peel
Justin Peel

Reputation: 47092

The Cholesky decomposition creates a lower-triangular matrix. This means that close to half of the multiplications done in the np.dot don't need to be done. If you change the line

betas_cand[con] = betas[con] + s * np.dot(conLSigma, np.random.standard_normal(size = nX))

into

tmp = np.random.standard_normal(size = nX)
for i in xrange(nX):
    for j in xrange(i+1):
        betas_cand[con,i] += s * conLSigma[i,j] * tmp[j]

However, you'll also need to change

cdef np.ndarray betas_cand = np.zeros([ncons, nX], dtype = np.float64)

into

cdef np.ndarray betas_cand = np.array(betas)

You could of course use slices for the multiplications, but I'm not sure if it will be faster than the way that I've suggested. Anyway, hopefully you get the idea. I don't think that there is much else that you can do to speed this up.

Upvotes: 5

Related Questions