Numpy: transpose result of advanced indexing

Question

>>> import numpy as np
>>> X = np.arange(27).reshape(3, 3, 3)
>>> x = [0, 1]
>>> X[x, x, :]
array([[ 0,  1,  2],
       [12, 13, 14]])

I need to sum it along the 0 dimension but in the real world the matrix is huge and I would prefer to be summing it along -1 dimension which is faster due to memory layout. Hence I would like the result to be transposed:

array([[ 0, 12],
       [ 1, 13],
       [ 2, 14]])

How do I do that? I would like the result of numpy's "advanced indexing" to be implicitly transposed. Transposing it explicitly with .T at the end is even slower and is not an option.

Update1: in the real world advanced indexing is unavoidable and the subscripts are not guaranteed to be the same.

>>> x = [0, 0, 1]
>>> y = [0, 1, 1]
>>> X[x, y, :]
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [12, 13, 14]])

Update2: To clarify that this is not an XY problem, here is the actual problem:

I have a large matrix X which contains elements x coming from some probability distribution. The probability distribution of the element depends on the neighbourhood of the element. This distribution is unknown so I follow the Gibbs sampling procedure to build a matrix which has elements from this distribution. In a nutshell it means that I make some initial guess for matrix X and then I keep iterating over the elements of matrix X updating each element x with a formula that depends on the neighbouring values of x. So, for any element of a matrix I need to get its neighbours (advanced indexing) and perform some operation on them (summation in my example). I have used line_profiler to see that the line which takes most of the time in my code is taking the sum of an array with respect to dimension 0 rather than -1. Hence I would like to know if there is a way to produce an already-transposed matrix as a result of advanced indexing.

ali_m · Accepted Answer

I would like to sum it along the 0 dimension but in the real world the matrix is huge and I would prefer to be summing it along -1 dimension which is faster due to memory layout.

I'm not totally sure what you mean by this. If the underlying array is row-major (the default, i.e. X.flags.c_contiguous == True), then it may be slightly faster to sum it along the 0th dimension. Simply transposing an array using .T or np.transpose() does not, in itself, change how the array is laid out in memory.

For example:

# X is row-major
print(X.flags.c_contiguous)
# True

# Y is just a transposed view of X
Y = X.T

# the indices of the elements in Y are transposed, but their layout in memory
# is the same as in X, therefore Y is column-major rather than row-major
print(Y.flags.c_contiguous)
# False

You can convert from row-major to column-major, for example by using np.asfortranarray(X), but there is no way to perform this conversion without making a full copy of X in memory. Unless you're going to be performing lots of operations over the columns of X then it almost certainly won't be worthwhile doing the conversion.

If you want to store the result of your summation in a column-major array, you could use the out= kwarg to X.sum(), e.g.:

result = np.empty((3, 3), order='F') # Fortran-order, i.e. column-major
X.sum(0, out=result)

In your case the difference between summing over rows vs columns is likely to be very minimal, though - since you are already going to be indexing non-adjacent elements in X you will already be losing the benefit of spatial locality of reference that would normally make summing over rows slightly faster.

For example:

X = np.random.randn(100, 100, 100)

# summing over whole rows is slightly faster than summing over whole columns
%timeit X.sum(0)
# 1000 loops, best of 3: 438 µs per loop
%timeit X.T.sum(0)
# 1000 loops, best of 3: 486 µs per loop

# however, the locality advantage disappears when you are addressing
# non-adjacent elements using fancy indexing
%timeit X[[0, 0, 1], [0, 1, 1], :].sum()
# 100000 loops, best of 3: 4.72 µs per loop
%timeit X.T[[0, 0, 1], [0, 1, 1], :].sum()
# 100000 loops, best of 3: 4.63 µs per loop

Update

@senderle has mentioned in the comments that using numpy v1.6.2 he sees the opposite order for the timing, i.e. X.sum(-1) is faster than X.sum(0) for a row-major array. This seems to be related to the version of numpy he is using - using v1.6.2 I can reproduce the order that he observes, but using two newer versions (v1.8.2 and 1.10.0.dev-8bcb756) I observe the opposite (i.e. X.sum(0) is faster than X.sum(-1) by a small margin). Either way, I don't think it's likely that changing the memory order of the array is likely to help much for the OP's case.

Numpy: transpose result of advanced indexing

Answers (1)

Update

Related Questions