Pedro Relich
Pedro Relich

Reputation: 23

How do I type my variables in cython so they pass to the memoryview array faster?

I am trying to optimize a double loop for an N-body integrator and I found that the problem with my code is that I'm incurring a massive overhead when I write stored variables into the memory view locations.

I originally had this code vectorized in numpy, but it was called inside another for loop to update the particle positions and the overhead was brutal. I have an np.ndarray Nx2 vector of positions (X) and I want to return an Nx2 vector of momentums (XOut) -- the current code listed below returns a memory view, but that's OK because I'd like to eventually embed this function in other Cython function once I've debugged this bottleneck.

I had tried the cython -a "name.pyx" command and found that I more or less have everything as a C-type. However, I found that towards the bottom of the loop, writing into the memoryview of XOut[ii,0] -= valuex is incurring most of the run time. If I change that into a constant so that XOut[ii,0] -= 5, the code is ~40X faster. I think this means I'm doing some sort of copy operation on that line which is slowing me down. My Cython/C++ backgrounds are rudimentary, but I think I need to change the syntax so that I'm writing into the memoryview from a pointer. Any advice would be greatly appreciative; Thanks!

import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI


DTYPE = np.float64  

ctypedef np.float64_t DTYPE_t  
cdef DTYPE_t pi = 3.141592653589793  

@cython.cdivision(True)  
@cython.boundscheck(False) # turn off bounds-checking for entire function  
@cython.wraparound(False)  # turn off negative index wrapping for entire function

def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):  
    cdef Py_ssize_t ii,jj,N  
    N = X.shape[0]  
    cdef DTYPE_t valuex,valuey,r2,xvec,yvec  
    for ii in range(0,N):  
        for jj in range(ii+1,N):  
            xvec = X[ii,0]-X[jj,0]  
            yvec = X[ii,1]-X[jj,1]  
            r2 = max(xvec**2+yvec**2,epsilon)  
            valuex = xvec/r2**2  
            valuey = yvec/r2**2  
            XOut[ii,0] -= valuex  
            XOut[ii,1] -= 5 #valuey  
            XOut[jj,0] += 5 #valuex  
            XOut[jj,1] += 5 #valuey  
        XOut[ii,0] /= 2*pi  
        XOut[ii,1] /= 2*pi   
    return XOut  

Upvotes: 0

Views: 210

Answers (1)

Pedro Relich
Pedro Relich

Reputation: 23

OK, so the issue was the mathematical operations. Cython doesn't optimize the ** operator so I modified the code:

import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI

DTYPE = np.float64

ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793

@cython.cdivision(True)
@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function
def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):
    cdef Py_ssize_t ii,jj,N
    N = X.shape[0]
    cdef DTYPE_t valuex,valuey,r2,xvec,yvec
    for ii in range(0,N-1):
        for jj in range(ii+1,N):
            xvec = X[ii,0]-X[jj,0]
            yvec = X[ii,1]-X[jj,1]
            r2 = max(xvec*xvec+yvec*yvec,epsilon)
            valuex = xvec/r2/r2
            valuey = yvec/r2/r2
            XOut[ii,0] -= valuex
            XOut[ii,1] -= valuey
            XOut[jj,0] += valuex
            XOut[jj,1] += valuey
        XOut[ii,0] /= 2*pi
        XOut[ii,1] /= 2*pi 
    return XOut

Changing valuex from xvec/r2**2 to xvec/r2/r2 and removing all instances of the ** operator sped up the loop to 9ms from 200ms for an 1800x2 array. I am still hopeful that a 4ms speed is possible, but I'll settle for 9ms for now.

Upvotes: 1

Related Questions