johntfoster
johntfoster

Reputation: 361

Numpy indexing 3-dimensional array into 2-dimensional array

I have a three-dimensional array of the following structure:

x = np.array([[[1,2],
               [3,4]],
              [[5,6],
               [7,8]]], dtype=np.double)

Additionally, I have an index array

idx = np.array([[0,1],[1,3]], dtype=np.int)

Each row of idx defines the row/column indices for the placement of each sub-array along the 0 axis in x into a two-dimensional array K that is initialized as

K = np.zeros((4,4), dtype=np.double)

I would like to use fancy indexing/broadcasting to performing the indexing without a for loop. I currently do it this way:

for i, id in enumerate(idx):

    idx_grid = np.ix_(id,id)

    K[idx_grid] += x[i]

Such that the result is:

>>> K = array([[ 1.,  2.,  0.,  0.],
               [ 3.,  9.,  0.,  6.],
               [ 0.,  0.,  0.,  0.],
               [ 0.,  7.,  0.,  8.]])

Is this possible to do with fancy indexing?

Upvotes: 2

Views: 2187

Answers (3)

Divakar
Divakar

Reputation: 221564

The intended operation is one of an accumulation of values from x into places indexed by idx. You could think of those idx places as bins of a histogram data and the x values as the weights that you need to accumulate for those bins. Now, to perform such a binning operation, np.bincount could be used. Here's one such implementation with it -

# Get size info of expected output
N = idx.max()+1

# Extend idx to cover two axes, equivalent to `np.ix_`
idx1 = idx[:,None,:] + N*idx[:,:,None]

# "Accumulate" values from x into places indexed by idx1
K = np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)

Runtime tests -

1) Create inputs:

In [361]: # Create x and idx, with idx having unique elements in each row of idx, 
     ...: # as otherwise the intended operation is not clear
     ...: 
     ...: nrows = 100
     ...: max_idx = 100
     ...: ncols_idx = 2
     ...: 
     ...: x = np.random.rand(nrows,ncols_idx,ncols_idx)
     ...: idx = np.random.randint(0,max_idx,(nrows,ncols_idx))
     ...: 
     ...: valid_mask = ~np.any(np.diff(np.sort(idx,axis=1),axis=1)==0,axis=1)
     ...: 
     ...: x = x[valid_mask]
     ...: idx = idx[valid_mask]
     ...: 

2) Define functions:

In [362]: # Define the original and proposed (bincount based) approaches
     ...: 
     ...: def org_approach(x,idx):
     ...:   N = idx.max()+1
     ...:   K = np.zeros((N,N), dtype=np.double)
     ...:   for i, id in enumerate(idx):    
     ...:       idx_grid = np.ix_(id,id)    
     ...:       K[idx_grid] += x[i]         
     ...:   return K
     ...: 
     ...: 
     ...: def bincount_approach(x,idx):
     ...:   N = idx.max()+1
     ...:   idx1 = idx[:,None,:] + N*idx[:,:,None]
     ...:   return np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)
     ...: 

3) Finally time them:

In [363]: %timeit org_approach(x,idx)
100 loops, best of 3: 2.13 ms per loop

In [364]: %timeit bincount_approach(x,idx)
10000 loops, best of 3: 32 µs per loop

Upvotes: 2

Alex Riley
Alex Riley

Reputation: 176810

Here's one alternative way. With x, idx and K defined as in your question:

indices = (idx[:,None] + K.shape[1]*idx).ravel('f')
np.add.at(K.ravel(), indices, x.ravel())

Then we have:

>>> K
array([[ 1.,  2.,  0.,  0.],
       [ 3.,  9.,  0.,  6.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  7.,  0.,  8.]])

To perform unbuffered inplace addition on NumPy arrays you need to use np.add.at (to avoid using += in a for loop).

However, it's slightly probelmatic to pass a list of 2D index arrays, and corresponding arrays to add at these indices, to np.add.at. This is because the function interprets these lists of arrays as higher-dimensional arrays and IndexErrors are raised.

It's much simpler to pass in 1D arrays. You can temporarily ravel K and x to give you a 1D array of zeros and a 1D array of values to add to those zeros. The only fiddly part is constructing a corresponding 1D array of indices from idx at which to add the values. This can be done via broadcasting with arithmetical operators and then ravelling, as shown above.

Upvotes: 3

Dietrich
Dietrich

Reputation: 5531

I do not think it is efficiently possible, since you have += in the loop. This means, you would have to "blow up" your array idx by one dimension and reduce it again by utilizing np.sum(x[...], axis=...). A minor optimization would be:

import numpy as np

xx = np.array([[[1, 2],
               [3, 4]],
              [[5, 6],
               [7, 8]]], dtype=np.double)

idx = np.array([[0, 1], [1, 3]], dtype=np.int)

K0, K1 = np.zeros((4, 4), dtype=np.double), np.zeros((4, 4), dtype=np.double)

for k, i in enumerate(idx):
    idx_grid = np.ix_(i, i)
    K0[idx_grid] += xx[k]

for x, i in zip(xx, idx):
    K1[np.ix_(i, i)] += x

print("K1 == K0:", np.allclose(K1, K0))  # prints: K1 == K0: True

PS: Do not use id as a variable name, since it is a Python keyword.

Upvotes: 0

Related Questions