Reputation: 361
I have a three-dimensional array of the following structure:
x = np.array([[[1,2],
[3,4]],
[[5,6],
[7,8]]], dtype=np.double)
Additionally, I have an index array
idx = np.array([[0,1],[1,3]], dtype=np.int)
Each row of idx
defines the row/column indices for the placement of each sub-array along the 0
axis in x
into a two-dimensional array K
that is initialized as
K = np.zeros((4,4), dtype=np.double)
I would like to use fancy indexing/broadcasting to performing the indexing without a for
loop. I currently do it this way:
for i, id in enumerate(idx):
idx_grid = np.ix_(id,id)
K[idx_grid] += x[i]
Such that the result is:
>>> K = array([[ 1., 2., 0., 0.],
[ 3., 9., 0., 6.],
[ 0., 0., 0., 0.],
[ 0., 7., 0., 8.]])
Is this possible to do with fancy indexing?
Upvotes: 2
Views: 2187
Reputation: 221564
The intended operation is one of an accumulation
of values from x
into places indexed by idx
. You could think of those idx
places as bins
of a histogram data and the x
values as the weights that you need to accumulate for those bins. Now, to perform such a binning operation, np.bincount
could be used. Here's one such implementation with it -
# Get size info of expected output
N = idx.max()+1
# Extend idx to cover two axes, equivalent to `np.ix_`
idx1 = idx[:,None,:] + N*idx[:,:,None]
# "Accumulate" values from x into places indexed by idx1
K = np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)
Runtime tests -
1) Create inputs:
In [361]: # Create x and idx, with idx having unique elements in each row of idx,
...: # as otherwise the intended operation is not clear
...:
...: nrows = 100
...: max_idx = 100
...: ncols_idx = 2
...:
...: x = np.random.rand(nrows,ncols_idx,ncols_idx)
...: idx = np.random.randint(0,max_idx,(nrows,ncols_idx))
...:
...: valid_mask = ~np.any(np.diff(np.sort(idx,axis=1),axis=1)==0,axis=1)
...:
...: x = x[valid_mask]
...: idx = idx[valid_mask]
...:
2) Define functions:
In [362]: # Define the original and proposed (bincount based) approaches
...:
...: def org_approach(x,idx):
...: N = idx.max()+1
...: K = np.zeros((N,N), dtype=np.double)
...: for i, id in enumerate(idx):
...: idx_grid = np.ix_(id,id)
...: K[idx_grid] += x[i]
...: return K
...:
...:
...: def bincount_approach(x,idx):
...: N = idx.max()+1
...: idx1 = idx[:,None,:] + N*idx[:,:,None]
...: return np.bincount(idx1.ravel(),x.ravel()).reshape(N,N)
...:
3) Finally time them:
In [363]: %timeit org_approach(x,idx)
100 loops, best of 3: 2.13 ms per loop
In [364]: %timeit bincount_approach(x,idx)
10000 loops, best of 3: 32 µs per loop
Upvotes: 2
Reputation: 176810
Here's one alternative way. With x
, idx
and K
defined as in your question:
indices = (idx[:,None] + K.shape[1]*idx).ravel('f')
np.add.at(K.ravel(), indices, x.ravel())
Then we have:
>>> K
array([[ 1., 2., 0., 0.],
[ 3., 9., 0., 6.],
[ 0., 0., 0., 0.],
[ 0., 7., 0., 8.]])
To perform unbuffered inplace addition on NumPy arrays you need to use np.add.at
(to avoid using +=
in a for
loop).
However, it's slightly probelmatic to pass a list of 2D index arrays, and corresponding arrays to add at these indices, to np.add.at
. This is because the function interprets these lists of arrays as higher-dimensional arrays and IndexErrors are raised.
It's much simpler to pass in 1D arrays. You can temporarily ravel K
and x
to give you a 1D array of zeros and a 1D array of values to add to those zeros. The only fiddly part is constructing a corresponding 1D array of indices from idx
at which to add the values. This can be done via broadcasting with arithmetical operators and then ravelling, as shown above.
Upvotes: 3
Reputation: 5531
I do not think it is efficiently possible, since you have +=
in the loop. This means, you would have to "blow up" your array idx
by one dimension and reduce it again by utilizing np.sum(x[...], axis=...)
.
A minor optimization would be:
import numpy as np
xx = np.array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]], dtype=np.double)
idx = np.array([[0, 1], [1, 3]], dtype=np.int)
K0, K1 = np.zeros((4, 4), dtype=np.double), np.zeros((4, 4), dtype=np.double)
for k, i in enumerate(idx):
idx_grid = np.ix_(i, i)
K0[idx_grid] += xx[k]
for x, i in zip(xx, idx):
K1[np.ix_(i, i)] += x
print("K1 == K0:", np.allclose(K1, K0)) # prints: K1 == K0: True
PS: Do not use id
as a variable name, since it is a Python keyword.
Upvotes: 0