Sergio
Sergio

Reputation: 727

How to do a reduction with numpy.nditer in the first axis

I'm trying to understand how to work with nditer to do a reduction, in my case converting a 3d array into a 2d array.

I followed the help here http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html and managed to create a function that applies reduction over the last axis of the input. With this function

def nditer_sum(data, red_axes):
    it = numpy.nditer([data, None],
            flags=['reduce_ok', 'external_loop'],
            op_flags=[['readonly'], ['readwrite', 'allocate']],
            op_axes=[None, red_axes])
    it.operands[1][...] = 0

    for x, y in it:
        y[...] = x.sum()

    return it.operands[1]

I can get something equivalent to data.sum(axis=2)

>>> data = numpy.arange(2*3*4).reshape((2,3,4))
>>> nditer_sum(data, [0, 1, -1])
[[ 6 22 38]
[54 70 86]]
>>> data.sum(axis=2)
[[ 6 22 38]
[54 70 86]]

So to get something equivalent to data.sum(axis=0) I though that it was enough to change the argument red_axes to [-1, 0,1] But the result is quite different.

>>> data = numpy.arange(2*3*4).reshape((2,3,4))
>>> data.sum(axis=0)
[[12 14 16 18]
 [20 22 24 26]
 [28 30 32 34]]
>>> nditer_sum(data, [-1, 0, 1])
[[210 210 210 210]
 [210 210 210 210] 
 [210 210 210 210]]

In the for loop inside nditer_sum (for x,y in it:), the iterator is looping 2 times and giving an array of length 12 each time, instead of looping 12 times and giving an array of length 2 each time. I have read the numpy documentation several times and googled about this to no avail. I'm using numpy 1.6 and python 2.7

Upvotes: 10

Views: 2705

Answers (2)

hpaulj
hpaulj

Reputation: 231510

The axis=0 case works correctly if the nditer order is changed to F. Now there are 12 steps with arrays of size (2,) as you wanted.

it = np.nditer([data, None],
        flags=['reduce_ok', 'external_loop'],
        op_flags=[['readonly'], ['readwrite', 'allocate']],
        order='F',     # ADDED to loop starting with the last dimension
        op_axes=[None, red_axes])

But there isn't a solution like this for middle axis=1 case.


Another approach to iterating on selected dimensions is to construct a 'multi_index' iterator on a reduced dimensional array. I discovered in https://stackoverflow.com/a/25097271/901925 that np.ndindex uses this trick to perform a 'shallow iteration'.

For the axis=0 case, this function works:

def sum_1st(data):
    y = np.zeros(data.shape[1:], data.dtype)
    it = np.nditer(y, flags=['multi_index'])
    while not it.finished:
        xindex = tuple([slice(None)]+list(it.multi_index))
        y[it.multi_index] = data[xindex].sum()
        it.iternext()
    return y

Or generalized to any axis:

def sum_any(data, axis=0):
    yshape = list(data.shape)
    del yshape[axis]
    y = np.zeros(yshape, data.dtype)
    it = np.nditer(y, flags=['multi_index'])
    while not it.finished:
        xindex = list(it.multi_index)
        xindex.insert(axis, slice(None))
        y[it.multi_index] = data[xindex].sum()
        it.iternext()
    return y

Upvotes: 1

Benjamin
Benjamin

Reputation: 11860

Changing the line y[...] = x.sum() to y[...] += x fixes it (as in the example here).

Upvotes: 0

Related Questions