Further optimization of simple cython code

Question

I have a function written in cython that computes a certain measure of correlation (distance correlation) via a double for loop:

%%cython -a
import numpy as np

def distances_Matrix(X):
    return np.array([[np.linalg.norm(xi-xj) for xi in X] for xj in X])

def c_dCov(double[:, :] a, double[:, :] b, int n):
    cdef int i
    cdef int j
    cdef double U       =  0
    cdef double W1      =  n/(n-1)
    cdef double W2      =  2/(n-2)
    cdef double[:] a_M  =  np.mean(a,axis=1)
    cdef double    a_   =  np.mean(a)
    cdef double[:] b_M  =  np.mean(b,axis=1)
    cdef double    b_   =  np.mean(b)

    for i in range(n):
        for j in range(n):
            if i != j:
                U = U + (a[i][j] + W1*(-a_M[i]-a_M[j]+a_)) * (b[i][j] +   W1*(-b_M[i]-b_M[j]+b_))
            else:
                U = U - W2*(W1**2)*(a_M[i] - a_) * (b_M[i] - b_)
    return U/(n*(n-3))

def c_dCor(X,Y):
    n     =  len(X)
    a     =  distances_Matrix(X)
    b     =  distances_Matrix(Y)
    V_XX  =  c_dCov(a,a,n) 
    V_YY  =  c_dCov(b,b,n)
    V_XY  =  c_dCov(a,b,n)
    return V_XY/np.sqrt(V_XX*V_YY)

When I compile this fragment of code I get the following report of optimization by the compiler:

Line 23 is still quite yellow, which indicates significant python interactions, how can I make that line further optimized?.

The operations done on that line are quite trivial, just products and sums, since I did specify the types of every array and variable used in that function, why do I get such a bad performance on that line?

Thanks in advance.

tel · Accepted Answer

Short answer: disable bounds checking in the c_dCov function by adding the following decorator on the line right before it:

cimport cython
@cython.boundscheck(False)  # Deactivate bounds checking
def c_dCov(double[:, :] a, double[:, :] b, int n):

Alternatively, you can add a compiler directive to the top of your code. Right after your Cython magic line you would put:

%%cython -a
#cython: language_level=3, boundscheck=False

If you had a setup.py file, you could also globally turn bounds checking off there:

from distutils.core import setup
from Cython.Build import cythonize

setup(
    name="foo",
    ext_modules=cythonize('foo.pyx', compiler_directives={'boundscheck': False}),
)

Regardless of how it was done, disabling bounds checks was by itself enough to get the following optimization report:

Some other optimizations suggested by the Cython docs are turning off indexing with negative numbers, and declaring that your arrays are guaranteed to have a contiguous layout in memory. With all of those optimizations, the signature of c_dCov would become:

cimport cython
@cython.boundscheck(False)  # Deactivate bounds checking
@cython.wraparound(False)   # Deactivate negative indexing.
def c_dCov(double[:, ::1] a, double[:, ::1] b, int n):

but only @cython.boundscheck(False) was needed to get the better optimization report.

Now that I look closer though, even though you don't have those optimizations in your code snippet, you do have the boundscheck(False) and wraparound(False) decorators in the code in your optimization report. Did you already try those and they didn't work? What version of Cython are you running? Maybe you need an upgrade.

Explanation

Every time you access an array by index, a bounds check occurs. This is so that when you have an array arr of shape (5,5) and you try to access arr[19,27], your program will spit out an error instead of letting you access out of bounds data. However, for the sake of speed, some languages don't do bounds check on array access (eg C/C++). Cython lets you optionally turn off bounds checks in order to optimize performance. With Cython, you can either disable bounds checking globally for a whole program with the boundscheck compiler directive, or for a single function with the @cython.boundscheck(False) decorator.

Further optimization of simple cython code

Answers (1)

Explanation

Related Questions