Updating by index in an multi-dimensional numpy array

Question

I am using numpy to tally a lot of values across many large arrays, and keep track of which positions the maximum values appear in.

In particular, imagine I have a 'counts' array:

data = numpy.array([[ 5, 10, 3],
                    [ 6, 9, 12],
                    [13, 3,  9],
                    [ 9, 3,  1],
                    ...
                    ])
counts = numpy.zeros(data.shape, dtype=numpy.int)

data is going to change a lot, but I want 'counts' to reflect the number of times the max has appeared in each position:

max_value_indices = numpy.argmax(data, axis=1)
# this is now [1, 2, 0, 0, ...] representing the positions of 10, 12, 13 and 9, respectively.

From what I understand of broadcasting in numpy, I should be able to say:

counts[max_value_indices] += 1

What I expect is the array to be updated:

[[0, 1, 0],
 [0, 0, 1],
 [1, 0, 0],
 [1, 0, 0],
 ...
]

But instead this increments ALL the values in counts giving me:

[[1, 1, 1],
 [1, 1, 1],
 [1, 1, 1],
 [1, 1, 1],
 ...
]

I also though perhaps if I transformed max_value_indices to a 100x1 array, it might work:

counts[max_value_indices[:,numpy.newaxis]] += 1

but this has effect of updating just the elements in positions 0, 1, and 2:

[[1, 1, 1],
 [1, 1, 1],
 [1, 1, 1],
 [0, 0, 0],
 ...
]

I'm also happy to turn the indices array into an array of 0's and 1's, and then add it to the counts array each time, but I'm not sure how to construct that.

unutbu · Accepted Answer

You could use so-called advanced integer indexing (aka Multidimensional list-of-locations indexing):

In [24]: counts[np.arange(data.shape[0]), 
                np.argmax(data, axis=1)] += 1

In [25]: counts
Out[25]: 
array([[0, 1, 0],
       [0, 0, 1],
       [1, 0, 0],
       [1, 0, 0]])

The first array, np.arange(data.shape[0]) specifies the row. The second array, np.argmax(data, axis=1) specifies the column.

Updating by index in an multi-dimensional numpy array

Answers (1)

Related Questions