FriskySaga
FriskySaga

Reputation: 439

Cython - Replacement for def __init__() method since Cython's Python functions and methods cannot handle unsigned char arrays with values of 0

all. I have this Cython code example down below where I have an unsigned char array, a filled with unsigned integers. When I pass in this array into a Python def method, the value of any index after the index containing 0 becomes messed up.

In this example, since the value of 0 was at the 6th index, all succeeding array indices from the array passed into the __cinit__() method have incorrect values. This behavior also happens for the __init__() method or any function or method using the Python declaration def.

However, when the array is passed into any cdef or cpdef function or method, the values of the array is correct.

So, I have two questions (and note that I am using a .pyx runner file):

  1. Am I passing in the array into the __cinit__() method incorrectly? Is there another way to do it?
  2. Alternatively, is there a Cythonic way of replacing the def __cinit__() method? Of course, I could use a workaround and use cdef or cpdef methods, especially for this simple little example that I'm showing, but I would like to learn whether there is a different way...

Code:

cdef class Classical:
    def __cinit__(self, unsigned char *b):
        for x in range(0, 12):
            print b[x], " init" # This does not work

    cdef void bar(self, unsigned char *b):
        for x in range(0, 12):
            print b[x], " method" # This works fine

def foo(unsigned char *b):
    for x in range(0, 12):
        print b[x], " function" # This does not work either

cdef unsigned char a[12]
a = [
    83,
    12,
    85,
    31,
    7,
    0,
    91,
    11,
    0,
    12,
    77,
    100
]
Classical(a).bar(a)
foo(a)

Output:

83  init
12  init
85  init
31  init
7  init
0  init
0  init
0  init
0  init
0  init
0  init
0  init
83  method
12  method
85  method
31  method
7  method
0  method
91  method
11  method
0  method
12  method
77  method
100  method
83  function
12  function
85  function
31  function
7  function
0  function
100  function
0  function
0  function
0  function
0  function
0  function

Upvotes: 1

Views: 135

Answers (1)

ead
ead

Reputation: 34367

All arguments of a def-function are Python objects. A char * (the same holds for unsigned char *) isn't a Python object, however it is possible to automatically convert (some) Python objects to char *. So

def foo(char *x):
   ...

means for Cython: check that the passed Python object can be converted to a cdef char *, perform the conversion and use the result of this conversion in the body of the function.

When calling a def-function with a char * (see also this somewhat related SO-post) as argument:

cdef char a[12]
....
bar(a) # a decays to char *

Cython performs the following: Use automatic conversion of char * assuming it is a null-terminated c-string to a bytes-object and pass this temporary bytes-object to the def-function bar.

That means in your case:

  • calling foo(a) creates a temporary bytes-object of length 5 (and not 12, because 6th element is 0), to which the first 5 characters are copied.
  • inside of function foo the address of this bytes-object's buffer is taken and used as unsigned char *b, which has now only 6 elements (including trailing \0), so accessing it via b[6] is undefined behavior and could end in a segmentation fault.

You can verify that a and b point to different addresses via

print("Address:", <unsigned long long>(&a[0])) # or &b[0]

So the problem is actually, that when you call foo not the whole array is converted to the temporary bytes-object. The conversion from/to char * is described in Cython-documentation. In your case the call should be:

foo(a[:12]) #pass the length explicitly, so cython doesn't have to depend on '\0'

and now the following is printed:

83  function
12  function
85  function
31  function
7  function
0  function
91  function
11  function
0  function
12  function
77  function
100  function

The situation is different for cdef-functions, where char * stays char * and isn't converted to a Python-object. However, __cinit__ must be a def function and thus in this case usually a cdef-factory function is used, as in the answer pointed out by @DavidW, e.g.:

cdef class Classical:
    ...
    @staticmethod
    cdef Classical create(char* ptr):
        obj = <Classical>Classical.__new__(Classical) # __init__ isn't called!
        # set up obj while using ptr
        ...
        return obj

Obviously, Classical.create can be only used from Cython code, but on the other hand only Cython-code has pointers!

Upvotes: 2

Related Questions