Reputation: 111
Is there a neat way to assign values to given indices in an array, and average values in repeated indices? For example:
a = np.array([0, 0, 0, 0, 0])
ind = np.array([1, 1, 2, 3])
b = np.array([2, 3, 4, 5])
and I want to assign values in array b to array a at corresponding indices indicated in 'ind', and a[1] should be average of 2 and 3.
I can try a for-loop:
hit = np.zeros_like(a)
for i in range(ind.size):
hit[ind[i]] += 1
a[ind[i]] += b[i]
a = a / hit
But this code looks dirty. Is there any better way to do the job?
Upvotes: 0
Views: 66
Reputation: 10590
This might not necessarily be cleaner or faster, but here's an alternative that I think is easy to read:
a = [[] for _ in range(5)]
for i, x in zip(ind, b):
a[i].append(x)
[np.mean(x) if len(x) else 0 for x in a]
Upvotes: 0
Reputation: 231738
In [56]: a = np.zeros(5)
...: hit = np.zeros_like(a)
...: for i in range(ind.size):
...: hit[ind[i]] += 1
...: a[ind[i]] += b[i]
In [57]: a
Out[57]: array([0., 5., 4., 5., 0.])
In [58]: hit
Out[58]: array([0., 2., 1., 1., 0.])
The mention of duplicate indices brings to mind the .at
ufunc method:
In [59]: a = np.zeros(5)
In [60]: a = np.zeros(5)
...: hit = np.zeros_like(a)
...: np.add.at(a,ind,b)
...: np.add.at(hit,ind,1)
In [61]: a
Out[61]: array([0., 5., 4., 5., 0.])
In [62]: hit
Out[62]: array([0., 2., 1., 1., 0.])
This isn't quite as fast as a[ind]=b
, but faster than your loop.
np.bincount
might well be better for this task, but this add.at
is worth knowing and testing.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.at.html
Upvotes: 0
Reputation: 53109
Here is a vectorized method. The actual logic is close to your own solution.
n,d = (np.bincount(ind,x,a.size) for x in (b,None))
valid = d!=0
np.copyto(a,np.divide(n,d,where=valid),where=valid)
Upvotes: 0
Reputation: 384
You could do this using np.where.
import numpy as np
a = np.array([0, 0, 0, 0, 0]).astype('float64')
ind = np.array([1, 1, 2, 3])
b = np.array([2, 3, 4, 5])
for i in set(ind):
a[i] = np.mean(b[np.where(ind == i)])
Would result in:
In [5]: a
Out[5]: array([0. , 2.5, 4. , 5. , 0. ])
You are essentially finding all indices of ind
where the value of ind[index]
is equal to i
and then obtaining the mean of the values at those indices in b
and assigning that mean to a[i]
. Hope this helps!
Upvotes: 1