user2682863
user2682863

Reputation: 3218

OverflowError occurs when using cython with a large int

python 3.4, windows 10, cython 0.21.1

I'm compiling this function to c with cython

def weakchecksum(data):
   """
   Generates a weak checksum from an iterable set of bytes.
   """
   cdef long a, b, l
   a = b = 0
   l = len(data)
   for i in range(l):
       a += data[i]
       b += (l - i)*data[i]

   return (b << 16) | a, a, b

which produces this error: "OverflowError: Python int too large to convert to C long"

I've also tried declaring them as unsigned longs. What type do I use to work with really large numbers? If it's too large for a c long are there any workarounds?

Upvotes: 18

Views: 4755

Answers (2)

shaunc
shaunc

Reputation: 5611

If you make sure that your calculations are in c (for instance, declare i to be long, and put the data element into a cdefed variable or cast it before calculation), you won't get this error. Your actual results, though, could vary depending on platform, depending (potentially) on the exact assembly code generated and the resulting treatment of overflows. There are better algorithms for this, as @cod3monk3y has noted (look at the "simple checksums" link).

Upvotes: 5

Andrew Svetlov
Andrew Svetlov

Reputation: 17376

cython compiles pyx files to C, thus it depends on underlying C compiler.

Size of integer types in C varies on different platforms and operations systems, and C standard don't dictate exact implementation.

However there is de facto implementation conventions.

Windows for both 32 and 64 bit uses 4 bytes (32 bits) for int and long, 8 bytes (64 bits) for long long. The difference between Win32 and Win64 is size of pointer (32 bits for Win32 and 64 bits for Win64). (See Data Type Ranges] from MSDN).

Linux uses another model: int is 32 bits for both linux-32 and linux-64, long long is always 64-bit. long and pointers are vary: 32 bits on linux-32 and 64 bits on linux-64.

Long story short: if you need maximum capacity for integer type which doesn't changed on different platforms use long long (or unsigned long long).

The data range for long long is [–9223372036854775808, 9223372036854775807].

If you need numbers with arbitrary precision there is GMP library -- de facto standard for high-precision arithmetic. Python has wrapper for it called gmpy2.

Upvotes: 13

Related Questions