Xavier Merino
Xavier Merino

Reputation: 669

Precision is lost / changed once data goes to Cython module

I ported some code that was using NumPy into Cython to get some performance boost. I achieved a considerable boost but I'm experiencing one issue.

The results obtained by Cython differ from the ones obtained by Python. I have no clue why this is happening so I decided to see what was getting pushed to the Cython module.

Before it gets to Cython the data looks like:

azimuth = 0.000349065850399 

rawDistance = [ 2.682  7.234  2.8    7.2    2.912  7.19   3.048  7.174  3.182  7.162
  3.33   7.164  3.506  7.158  3.706  7.154  3.942  7.158  4.192  7.158
  4.476  7.186  4.826  7.19   5.218  7.204  5.704  7.224  6.256  7.248
  6.97   7.284] 

intensity = [19 34 25 28 26 48 21 56 21 60 31 49 24 37 26 37 34 37 23 84 15 59 23 45 
             18  47 20 55 18 36 15 39]

Once it gets into Cython, this same data looks like:

azimuth = 0.000349065850399 

rawDistance = [2.686, 7.23, 2.7960000000000003, 7.204, 2.91, 7.188, 3.044, 7.174, 3.19, 
               7.16, 3.3280000000000003, 7.16, 3.5, 7.154, 3.704, 7.144, 3.936, 7.158, 
               4.196, 7.156000000000001, 4.478, 7.19, 4.8260000000000005, 7.192, 5.22, 
               7.204, 5.708, 7.22, 6.256, 7.252, 6.97, 7.282] 

intensity = [19, 34, 27, 28, 26, 48, 22, 52, 21, 60, 31, 49, 24, 37, 28, 34, 32, 37, 
             23, 84, 15, 59, 23, 45, 18, 47, 20, 58, 18, 36, 15, 36]

That explains why the results are not exactly the same as the ones calculated by the pure Python method.

This is the Cython module the information is being transferred to:

from libc.math cimport sin, cos
import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def calculateXYZ(list frames, double[:] cosVertCorrection, double[:] sinVertCorrection):
    cdef long numberFrames = len(frames)
    cdef long i, j, k, numberBlocks
    cdef list finalResults = []
    cdef list intensities = []
    cdef list frameXYZ = []
    cdef double azimuth, xy, x, y, z, sinRotational, cosRotational
    cdef double[32] rawDistance
    cdef int[32] intensity
    cdef double[:] tempX
    cdef double[:] tempY
    cdef double[:] tempZ
    cdef int positionsFilled = 0

    for i in xrange(numberFrames):
        numberBlocks = len(frames[i])
        tempX = np.zeros(numberBlocks * 32, dtype=np.double)
        tempY = np.zeros(numberBlocks * 32, dtype=np.double)
        tempZ = np.zeros(numberBlocks * 32, dtype=np.double)
        frameXYZ = [[] for i in range(3)]
        positionsFilled = 0

        for j in xrange(numberBlocks):
            # This is where I tested for the data in Cython
            # This is the information that is different. 
            # It is reading from what was passed to it from python.

            azimuth = frames[i][j][0]
            rawDistance = frames[i][j][1]
            intensity = frames[i][j][2]
            sinRotational, cosRotational = sin(azimuth), cos(azimuth)

            for k in xrange(32):
                xy = rawDistance[k] * cosVertCorrection[k]
                x, y = xy * sinRotational, xy * cosRotational
                z = rawDistance[k] * sinVertCorrection[k]

                if x != 0 or y != 0 or z != 0:
                    tempX[positionsFilled] = x
                    tempY[positionsFilled] = y
                    tempZ[positionsFilled] = z
                    intensities.append(intensity[k])
                    positionsFilled = positionsFilled + 1

        frameXYZ[0].append(np.asarray(tempX[0:positionsFilled].copy()).tolist())
        frameXYZ[1].append(np.asarray(tempY[0:positionsFilled].copy()).tolist())
        frameXYZ[2].append(np.asarray(tempZ[0:positionsFilled].copy()).tolist())
        finalResults.append(frameXYZ)

    return finalResults, intensities

This is the pure Python version of it:

documentXYZ = []
intensities = []

# I tested to see what the original data was in here adding prints

for frame in frames:
    frameXYZ = [[] for i in range(3)]
    frameX, frameY, frameZ = [], [], []
    for block in frame:
        sinRotational, cosRotational = np.math.sin(block[0]), np.math.cos(block[0])
        rawDistance, intensity = np.array(block[1]), np.array(block[2])
        xy = np.multiply(rawDistance, cosVertCorrection)
        x, y, z = np.multiply(xy, sinRotational), np.multiply(xy, cosRotational), np.multiply(rawDistance, sinVertCorrection)
        maskXYZ = np.logical_and(np.logical_and(x, x != 0), np.logical_and(y, y != 0), np.logical_and(z, z != 0))
        frameX += x[maskXYZ].tolist()
        frameY += y[maskXYZ].tolist()
        frameZ += z[maskXYZ].tolist()
        intensities += intensity[maskXYZ].tolist()

    frameXYZ[0].append(frameX), frameXYZ[1].append(frameY), frameXYZ[2].append(frameZ)
    documentXYZ.append(frameXYZ)

I understand that there might be (although I think there shouldn't since I'm using doubles in all of the structures) a difference in precision for floating point values but I don't understand why the intensity values that are integers are being changed as well. I would like the precision to be the same as in Python.

Any ideas on how to improve this?

Thanks.

Upvotes: 5

Views: 630

Answers (1)

ely
ely

Reputation: 77484

The first two steps to solving the problem are:

  1. Determine the specific integer type used by NumPy on your platform (e.g. int32, int64 ...), for example by checking the dtype attribute of the integer array or one of its values.

  2. Determine the bit-width of int on your platform with your chosen C implementation. Usually it will be 32 bits, but not always (check with sizeof for example).

Once you know these two details, you can determine in what way a plain (C) int fails to match the integer precision that NumPy had been using. A common guess would be that NumPy is using int64 but in C you're using int which is likely int32 for your platform / implementation. Another common case is that NumPy is using an unsigned integer, while in C int will be signed, leading to different representations even if having the same number of bits.

You can refer to fixed width integers in Cython easily, in at least the following three ways:

  1. Since you have used cimport numpy as np you can refer to NumPy's fixed-width integer types, such as np.int64_t or np.uint8_t. The "_t" typedefs are available in NumPy's Cython support.

  2. You can try to figure out a standard type name from your C implementation and platform, such as possibly cython.longlong for a 64-bit integer or cython.uchar for an unsigned 8-bit integer, which corresponds to exactly the right number of bits and correct signedness for an integer to match the precision and signedness of whichever type had been used by NumPy.

  3. You can also import from the C standard library, such as from libc.stdint import int64_t, uint8_t if you prefer to use C's standard header file for fixed-width integers of a specified size.

Assuming you've chosen the appropriate integer type, you can then declare your intensity array with the proper type, for example any of the following depending on which approach you have chosen for expressing the proper integer type:

cdef np.uint8_t[32] intensity   # If using NumPy integer types
cdef uint8_t[32] intensity      # If importing from libc.stdint
cdef cython.uchar[32] intensity # If using Cython integer types

As a final note, it's good to keep in mind that regular Python integers are infinite precision, so if you manage to get a NumPy array of int types (not C int, but Python int), you'll have to decide on a different, fixed-precision representation when working in Cython, or else work with arrays or typed memoryviews containing Python int types (which generally defeats the purpose of dropping down to Cython in the first place).

Upvotes: 1

Related Questions