Reputation: 12567
Let's say I want to implement Numpy's
x[:] += 1
in Cython. I could write
@cython.boundscheck(False)
@cython.wraparoundcheck(False)
def add1(np.ndarray[np.float32_t, ndim=1] x):
cdef unsigned long i
for i in range(len(x)):
x[i] += 1
However, this only works with ndim = 1
. I could use
add1(x.reshape(-1))
but this only works with contiguous x
.
Does Cython offer any reasonably easy and efficient way to iterate Numpy arrays as if they were flat?
(Reimplementing this particular operation in Cython doesn't make sense, as the above Numpy's code should be as fast as it gets -- I'm just using this as a simple example)
UPDATE:
I benchmarked the proposed solutions:
@cython.boundscheck(False)
@cython.wraparound(False)
def add1_flat(np.ndarray x):
cdef unsigned long i
for i in range(x.size):
x.flat[i] += 1
@cython.boundscheck(False)
@cython.wraparound(False)
def add1_nditer(np.ndarray x):
it = np.nditer([x], op_flags=[['readwrite']])
for i in it:
i[...] += 1
The second function requires import numpy as np
in addition to cimport
. The results are:
a = np.zeros((1000, 1000))
b = a[100:-100, 100:-100]
%timeit b[:] += 1
1000 loops, best of 3: 1.31 ms per loop
%timeit add1_flat(b)
1 loops, best of 3: 316 ms per loop
%timeit add1_nditer(b)
1 loops, best of 3: 1.11 s per loop
So, they are 300 and 1000 times slower than Numpy.
UPDATE 2:
The add11
version uses a for
loop inside of a for
loop, and so doesn't iterate the array as if it were flat. However, it is as fast as Numpy in this case:
%timeit add1.add11(b)
1000 loops, best of 3: 1.39 ms per loop
On the other hand, add1_unravel
, one of the proposed solutions, fails to modify the contents of b
.
Upvotes: 3
Views: 1754
Reputation: 231395
http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Is a nice tutorial on using nditer
. It ends with a cython
version. nditer
is meant to be the all purpose array(s) iterator in numpy c
level code.
There are also good array examples on the Cython memoryview page:
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html
The data buffer of an ndarray
is a flat buffer. So regardless of the array's shape and strides, you can iterate over the whole buffer in a flat c
pointer fashion. But things like nditer
and memoryview
take care of the element size details. So in c
level coding it is actually easier to step through all the elements in a flat fashion than it is to step by rows - going by rows has to take into account the row stride.
This runs in Python, and I think it will translate nicely into cython (I don't have that setup on my machine at the moment):
import numpy as np
def add1(x):
it = np.nditer([x], op_flags=[['readwrite']])
for i in it:
i[...] += 1
return it.operands[0]
x = np.arange(10).reshape(2,5)
y = add1(x)
print(x)
print(y)
https://github.com/hpaulj/numpy-einsum/blob/master/sop.pyx is a sum-of-products script that I wrote a while back to simulate einsum
.
The core of its w = sop(x,y)
calculation is:
it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for xarr, yarr, warr in it:
x = xarr
y = yarr
w = warr
size = x.shape[0]
for i in range(size):
w[i] = w[i] + x[i] * y[i]
return it.operands[nop]
===================
copying ideas from the nditer.html
document, I got a version of add1
that is only half the speed of the native numpy
a+1
. The naive nditer
(above) isn't much faster in cython
than in python
. A lot of the speedup may be due to the external loop
.
@cython.boundscheck(False)
def add11(arg):
cdef np.ndarray[double] x
cdef int size
cdef double value
it = np.nditer([arg],
flags=['external_loop','buffered'],
op_flags=[['readwrite']])
for xarr in it:
x = xarr
size = x.shape[0]
for i in range(size):
x[i] = x[i]+1.0
return it.operands[0]
I also coded this nditer
in python with a print of size
, and found that it iterated on your b
with 78 blocks of size 8192 - that's a buffer size, not some characteristic of b
and its data layout.
In [15]: a = np.zeros((1000, 1000))
In [16]: b = a[100:-100, 100:-100]
In [17]: timeit add1.add11(b)
100 loops, best of 3: 4.48 ms per loop
In [18]: timeit b[:] += 1
100 loops, best of 3: 8.76 ms per loop
In [19]: timeit add1.add1(b) # for the unbuffered nditer
1 loop, best of 3: 3.1 s per loop
In [21]: timeit add1.add11(a)
100 loops, best of 3: 5.44 ms per loop
Upvotes: 2
Reputation: 6214
With numpy.ravel
numpy.ravel(a, order='C')
Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
@cython.boundscheck(False)
@cython.wraparound(False)
def add1_ravel(np.ndarray xs):
cdef unsigned long i
cdef double[::1] aview
narray = xs.ravel()
aview = narray
for i in range(aview.shape[0]):
aview[i] += 1
# return xs if the memory is shared
if not narray.flags['OWNDATA'] or np.may_share_memory(xs, narray):
return xs
# otherwise new array reshaped
shape = tuple(xs.shape[i] for i in range(xs.ndim))
return narray.reshape(shape)
Upvotes: 1
Reputation: 6430
You could try using the flat
attribute of the ndarray
, which provides an iterator over the flattened array object. It always iterates in C-major ordering, with the last index varying the fastest. Something like:
for i in range(x.size):
x.flat[i] += 1
Upvotes: 1