Macter
Macter

Reputation: 132

Pandas get minimum value of row and store corresponding index

I have a distance matrix stored as a pandas dataframe df which contains distances between 400 pairs of co-ordinates in kilometres as follows:

        0          1            2           3          4    ....
0   0.000000    1.740838    2.496827    2.577736    1.698938
1   1.740838    0.000000    1.373490    1.741935    1.261969
2   2.496827    1.373490    0.000000    0.420828    0.812797
3   2.577736    1.741935    0.420828    0.000000    1.131974
4   1.698938    1.261969    0.812797    1.131974    0.000000
.
.
.

I am interested in how close the next nearest co-ordinate is for each point, which I get with:

df[df > 0].min(axis=1)

0      0.074083
1      0.004708
2      0.119431
3      0.167242
4      0.018095

which gives the distance in KM to the next nearest point. I also want to store the index of that next closest point, so that I have:

     nearest_dist  nearest_id
0      0.074083       3
1      0.004708       151
2      0.119431       7
3      0.167242       4
4      0.018095       81

Upvotes: 1

Views: 576

Answers (2)

ALollz
ALollz

Reputation: 59549

stack then groupby + idxmin to slice the Series.

s = df.rename_axis(columns='nearest_id').stack().loc[lambda x: x > 0]
s = (s.loc[s.groupby(level=0).idxmin()]
      .to_frame('nearest_dist')
      .reset_index(-1))

  nearest_id  nearest_dist
0          4      1.698938
1          4      1.261969
2          3      0.420828
3          2      0.420828
4          2      0.812797

Can also build the DataFrame from numpy.

arr = df.where(df > 0).to_numpy()

pd.DataFrame({'nearest_id': np.nanargmin(arr, 1), 
              'nearest_dist': np.nanmin(arr, 1)},
             index=df.index)

Upvotes: 3

Ben.T
Ben.T

Reputation: 29635

you can use concat and the same idea that you used with min but with idxmin.

m = df>0
res = pd.concat([df[m].min(axis=1), df[m].idxmin(axis=1)], 
                axis=1, keys=['nearest_dist','nearest_id'])

print(res)
   nearest_dist nearest_id
0      1.698938          4
1      1.261969          4
2      0.420828          3
3      0.420828          2
4      0.812797          2

or even more simple with agg

res = (df[m].agg(['min', 'idxmin'], axis=1)
            .rename(columns={'min':'nearest_dist', 'idxmin':'nearest_id'}))

Upvotes: 3

Related Questions