Emanuele Paolini
Emanuele Paolini

Reputation: 10162

using indices with multiple values, how to get the smallest one

I have an index to choose elements from one array. But sometimes the index might have repeated entries... in that case I would like to choose the corresponding smaller value. Is it possible?

index = [0,3,5,5]
dist = [1,1,1,3]
arr = np.zeros(6)
arr[index] = dist
print arr

what I get:

[ 1.  0.  0.  1.  0.  3.]

what I would like to get:

[ 1.  0.  0.  1.  0.  1.]

addendum

Actually I have a third array with the (vector) values to be inserted. So the problem is to insert values from values into arr at positions index as in the following. However I want to choose the values corresponding to minimum dist when multiple values have the same index.

index = [0,3,5,5]
dist = [1,1,1,3]
values = np.arange(8).reshape(4,2)
arr = np.zeros((6,2))
arr[index] = values
print arr

I get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 6.  7.]]

I would like to get:

 [[ 0.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 2.  3.]
 [ 0.  0.]
 [ 4.  5.]]

Upvotes: 1

Views: 114

Answers (2)

hpaulj
hpaulj

Reputation: 231385

If index is sorted, then itertools.groupby could be used to group that list.

np.array([(g[0],min([x[1] for x in g[1]])) for g in 
    itertools.groupby(zip(index,dist),lambda x:x[0])])

produces

array([[0, 1],
       [3, 1],
       [5, 1]])

This is about 8x slower than the version using np.unique. So for N=1000 is similar to the Pandas version (I'm guessing since something is screwy with my Pandas import). For larger N the Pandas version is better. Looks like the Pandas approach has a substantial startup cost, which limits its speed for small N.

Upvotes: 1

HYRY
HYRY

Reputation: 97291

Use groupby in pandas:

import pandas as pd
index = [0,3,5,5]
dist = [1,1,1,3]
s = pd.Series(dist).groupby(index).min()
arr = np.zeros(6)
arr[s.index] = s.values
print arr

Upvotes: 1

Related Questions