Reputation: 119
I have the following problem with sorting a 2D array using the function argsort
.
More precisely, let's assume I have 5 points and have calculated the euclidean distances between them, which are stored in the 2D array D
:
D=np.array([[0,0.3,0.4,0.2,0.5],[0.3,0,0.2,0.6,0.1],
[0.4,0.2,0,0.5,0],[0.2,0.6,0.5,0,0.7],[0.5,0.1,0,0.7,0]])
D
array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
Each element D[i,j]
(i,j=0,...,4) shows the distance between point i and point j. The diagonal entries are of course equal to zero, as they show the distance of a point to itself. However, there can be 2 or more points which overlap. For instance, in this particular case, point 4 is located in the same position of point 2, so that the the distances D[2,4]
and D[4,2]
are equal to zero.
Now, I want to sort this array D
: for each point i I want to know the indices of its neighbour points, from the closest to the furthest one. Of course, for a given point i the first point/index in the sorted array should be i, i.e. the the closest point to point 1 is 1. I used the function argsort
:
N = np.argsort(D)
N
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[2, 4, 1, 0, 3]])
This function sorts the distances properly until it gets to point 4: the first entry of the 4th row (counting from zero) is not 4 (D[4,4]=0
) as I would like. I would like the 4th row to be [4, 2, 1, 0, 3]
. The first entry is 2, because points 2 and 4 overlap so that D[2,4]=D[4,2]
, and between the same value entries D[2,4]=0
and D[4,2]=0
, argsort
selects always the first one.
Is there a way to fix this so that the sorted array N[i,j]
of D[i,j]
always starts with the indices corresponding to the diagonal entries D[i,i]=0
?
Thank you for your help, MarcoC
Upvotes: 1
Views: 232
Reputation: 3483
How about this:
import numpy as np
D = np.array([[ 0. , 0.3, 0.4, 0.2, 0.5],
[ 0.3, 0. , 0.2, 0.6, 0.1],
[ 0.4, 0.2, 0. , 0.5, 0. ],
[ 0.2, 0.6, 0.5, 0. , 0.7],
[ 0.5, 0.1, 0. , 0.7, 0. ]])
s = np.argsort(D)
line = np.argwhere(s[:,0] != np.arange(D.shape[0]))[0,0]
column = np.argwhere(s[line,:] == line)[0,0]
s[line,0], s[line, column] = s[line, column], s[line,0]
Just find the lines that don't have the dioganal element in front using numpy.argwhere
, then the column to swap and then swap the elements. Then s
contains what you want in the end.
This works for your example. In a general case, where numpy.argwhere
can contain several elements, one would have to run a loop over those elements instead of just typing [0,0]
at the end of the two lines of code above.
Hope I could help.
Upvotes: 0
Reputation: 221624
One way would be to fill the diagonal elements with something lesser than global minimum and then use argsort
-
In [286]: np.fill_diagonal(D,D.min()-1) # Or use -1 for filling
# if we know beforehand that the global minimum is 0
In [287]: np.argsort(D)
Out[287]:
array([[0, 3, 1, 2, 4],
[1, 4, 2, 0, 3],
[2, 4, 1, 0, 3],
[3, 0, 2, 1, 4],
[4, 2, 1, 0, 3]])
If you don't want the input array to be changed, make a copy and then do the diagonal filling.
Upvotes: 2