Drishti
Drishti

Reputation: 43

Loss of Precision when doing Addition of Doubles in Cython

I'm currently trying to optimize Python code with Cython. I need the output to be exactly the same but I'm hoping trouble with precision. From what I understand, Python has unlimited precision and Cython's 'Double' is the equivalent to a Python float. I'm struggling with the function below (not allowed to share the code - it's a dummy function with similar structure):

def dummyfunction(c_np.ndarray[double, ndim=1] dummyarray, int a, int b, const c_np.uint8_t[:,:] dummyimg):
    cdef double q = 0.111
    cdef double w = 0.222
    cdef double e = 0.333 
    cdef double[:] dummyview = dummyarray   
    cdef int i, j
    cdef int r, g, b
    for i in range(a):
        for j in range(b):
            r = dummyimg[j][0]
            g = dummyimg[j][1]
            b = dummyimg[j][2]
            dummyarray[i * b + j] = (
                q * r
                + w * g
                + e * b
            )
    dummyarray[:] = dummyview #i'm updating a class attribute in place

I've tried printing 'qr', 'wg' and 'e*b'. The precision for these products are the same as that in Python! The trouble is with adding up these three values. It leaves only three decimal places. I feel like it's because in most cases, one of the 3 components being summed up only have up to 3 decimal places (e.g. 35.879999999999995, 51.068999999999996, 9.348). Python seems to sum it up to much higher precision though (i.e. 96.29699999999998 vs 96.297).

Any advice?

Upvotes: 1

Views: 686

Answers (1)

Drew Hall
Drew Hall

Reputation: 29055

First, Python only has arbitrary precision for integer math. For floating point math, a Python float is an IEEE double=precision (64-bit) value just as a Cython double is.

Assuming you're on an x86 (or x86-64) platform, there are a couple of likely culprits. The x86 architecture offers two different instruction sets for floating point math. The classical path uses the x87 instruction set, and all calculations are actually done with 80 bit (aka "long double") precision. When a value (intermediate or final) is stored to memory, it is truncated to 64 bit precision. As long as it remains in an FPU register, it retains full 80-bit precision.

The other available instruction set uses the so called SSE (Streaming SIMD Extensions), which can operate on multiple operands simultaneously. However, these calculations are done only with the "strict" precision of the type (64 bits in this case).

My guess is that Python chooses one code path, while Cython chooses the other. Equally likely, they both choose the same instruction set (most likely the SSE one), but they add the intermediate products in different orders. Because of limited precision, the ordering of the sums affects the accuracy of the result.

Also, note that in either case, the computation will be done with AT LEAST 64-bit precision. There is no computation that "only has 3 decimal places" as you said. And remember, as always, that the computation is done in binary floating point, not decimal. The true difference between the values is likely to be only a few low-order bits.

Upvotes: 3

Related Questions