Reputation: 10162
I have an index to choose elements from one array. But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. Is it possible?
index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr
what I get:
[ 1. 0. 0. 1. 0. 3.]
what I would like to get:
[ 1. 0. 0. 1. 0. 1.]
addendum
Actually I have a third array with the (vector) values to be inserted. So the problem is to insert values from values
into arr
at positions index
as in the following. However I want to choose the values corresponding to minimum dist
when multiple values have the same index.
index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr
I get:
[[ 0. 1.]
[ 0. 0.]
[ 0. 0.]
[ 2. 3.]
[ 0. 0.]
[ 6. 7.]]
I would like to get:
[[ 0. 1.]
[ 0. 0.]
[ 0. 0.]
[ 2. 3.]
[ 0. 0.]
[ 4. 5.]]
Upvotes: 1
Views: 114
Reputation: 231385
If index
is sorted, then itertools.groupby
could be used to group that list.
np.array([(g[0],min([x[1] for x in g[1]])) for g in
itertools.groupby(zip(index,dist),lambda x:x[0])])
produces
array([[0, 1],
[3, 1],
[5, 1]])
This is about 8x slower than the version using np.unique
. So for N=1000
is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). For larger N the Pandas version is better. Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N.
Upvotes: 1
Reputation: 97291
Use groupby
in pandas:
import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr
Upvotes: 1