Anis
Anis

Reputation: 3094

Arrays allocated at same address Cython + Numpy

I came across some funny memory behavior working with numpy + cython, while trying to get data from a numpy array as a C array, to use in a GIL-free function. I've taken look at both the cython and numpy's array API but I haven't found any explanation. So consider the following lines of code:

cdef np.float32_t *a1 = <np.float32_t *>np.PyArray_DATA(np.empty(2, dtype="float32"))
print "{0:x}".format(<unsigned int>a1)
cdef np.float32_t *a2 = <np.float32_t *>np.PyArray_DATA(np.empty(2, dtype="float32"))
print "{0:x}".format(<unsigned int>a2)[]

I allocate two numpy arrays with numpy's empty function, and want to retrieve a pointer to the data buffer for each of them. You would expect these two pointers to point to two different memory addresses on the heap, possibly spaced by 2*4 bytes. But no, I get pointers to the same memory address, e.g.

>>>96a7aec0
>>>96a7aec0

How come? I managed to work around that by declaring my numpy arrays outside of the PyArray_DATA call, in such case, I get what I expect.

The only explanation I can think of, is that I don't create any Python object out of the scope of the PyArray_DATA function, and calling this function doesn't increment Python's reference count. Therefore the GC reclaims this memory space right after, and the next array is allocated at the now free previous memory address. Could somebody more cython-savvy than me could confirm that or give another explanation?

Upvotes: 2

Views: 615

Answers (1)

oz1
oz1

Reputation: 998

You create two temporary numpy arrays, they happen to be at the same address. Since no python references kept for them, they are garbage collected immediately, a1 and a2 also become dangling pointers.

If references kept for them, their addresses can not be the same, eg:

cdef int[:] a = np.arange(10)  # A memoryview will keep the numpy array from GC.
cdef int[:] b = np.arange(10)
cdef int* a_ptr = &a[0]
cdef int* b_ptr = &b[0]
print(<size_t>a_ptr)
print(<size_t>b_ptr)

Grate care must be taken when using an object's underlying data. If used incorrectly, one often encounter a dangling pointer. eg:

void cfunc(const char*)
# Fortunately, this won't compile in cython. 
# Error: Storing unsafe C derivative of temporary Python reference
cdef const char* = ("won't" + " compile").encode()
cfunc(char)

Right way:

# make sure keep_me is alive before cfunc have finished with it.
cdef bytes keep_me = ("right" + "way").encode() 
cfunc(temp)
# Or for single use.
cfunc(("right" + "way").encode())

Another example in c++ std::string's member c_str():

// The result of `+` will immediately destructed. cfunc got a  dangling pointer.
const char * s = (string("not") + string("good")).c_str();
cfunc(s); 

Right way:

// keep `keep_me` for later use.
string keep_me = string("right") + string("way"); 
cfunc(keep_me.c_str());
// Or, for single use.
cfunc((string("right") + string("way")).c_str())

Reference: std::string::c_str() and temporaries

Upvotes: 2

Related Questions