MarcoC
MarcoC

Reputation: 119

Python, issue with ties using argsort

I have the following problem with sorting a 2D array using the function argsort.

More precisely, let's assume I have 5 points and have calculated the euclidean distances between them, which are stored in the 2D array D:

D=np.array([[0,0.3,0.4,0.2,0.5],[0.3,0,0.2,0.6,0.1],
           [0.4,0.2,0,0.5,0],[0.2,0.6,0.5,0,0.7],[0.5,0.1,0,0.7,0]])
D
array([[ 0. ,  0.3,  0.4,  0.2,  0.5],
       [ 0.3,  0. ,  0.2,  0.6,  0.1],
       [ 0.4,  0.2,  0. ,  0.5,  0. ],
       [ 0.2,  0.6,  0.5,  0. ,  0.7],
       [ 0.5,  0.1,  0. ,  0.7,  0. ]])

Each element D[i,j] (i,j=0,...,4) shows the distance between point i and point j. The diagonal entries are of course equal to zero, as they show the distance of a point to itself. However, there can be 2 or more points which overlap. For instance, in this particular case, point 4 is located in the same position of point 2, so that the the distances D[2,4] and D[4,2] are equal to zero.

Now, I want to sort this array D: for each point i I want to know the indices of its neighbour points, from the closest to the furthest one. Of course, for a given point i the first point/index in the sorted array should be i, i.e. the the closest point to point 1 is 1. I used the function argsort:

N = np.argsort(D)
N
array([[0, 3, 1, 2, 4],
       [1, 4, 2, 0, 3],
       [2, 4, 1, 0, 3],
       [3, 0, 2, 1, 4],
       [2, 4, 1, 0, 3]])

This function sorts the distances properly until it gets to point 4: the first entry of the 4th row (counting from zero) is not 4 (D[4,4]=0) as I would like. I would like the 4th row to be [4, 2, 1, 0, 3]. The first entry is 2, because points 2 and 4 overlap so that D[2,4]=D[4,2], and between the same value entries D[2,4]=0 and D[4,2]=0, argsort selects always the first one.

Is there a way to fix this so that the sorted array N[i,j] of D[i,j] always starts with the indices corresponding to the diagonal entries D[i,i]=0?

Thank you for your help, MarcoC

Upvotes: 1

Views: 232

Answers (2)

Michael H.
Michael H.

Reputation: 3483

How about this:

import numpy as np

D = np.array([[ 0. ,  0.3,  0.4,  0.2,  0.5],
              [ 0.3,  0. ,  0.2,  0.6,  0.1],
              [ 0.4,  0.2,  0. ,  0.5,  0. ],
              [ 0.2,  0.6,  0.5,  0. ,  0.7],
              [ 0.5,  0.1,  0. ,  0.7,  0. ]])

s = np.argsort(D)
line = np.argwhere(s[:,0] != np.arange(D.shape[0]))[0,0]
column = np.argwhere(s[line,:] == line)[0,0]
s[line,0], s[line, column] = s[line, column], s[line,0]

Just find the lines that don't have the dioganal element in front using numpy.argwhere, then the column to swap and then swap the elements. Then s contains what you want in the end.

This works for your example. In a general case, where numpy.argwhere can contain several elements, one would have to run a loop over those elements instead of just typing [0,0] at the end of the two lines of code above.

Hope I could help.

Upvotes: 0

Divakar
Divakar

Reputation: 221624

One way would be to fill the diagonal elements with something lesser than global minimum and then use argsort -

In [286]: np.fill_diagonal(D,D.min()-1) # Or use -1 for filling
           # if we know beforehand that the global minimum is 0

In [287]: np.argsort(D)
Out[287]: 
array([[0, 3, 1, 2, 4],
       [1, 4, 2, 0, 3],
       [2, 4, 1, 0, 3],
       [3, 0, 2, 1, 4],
       [4, 2, 1, 0, 3]])

If you don't want the input array to be changed, make a copy and then do the diagonal filling.

Upvotes: 2

Related Questions