Reputation: 210832
I have the following square DataFrame:
In [104]: d
Out[104]:
a b c d e
a inf 5.909091 8.636364 7.272727 4.454545
b 7.222222 inf 8.666667 7.666667 1.777778
c 15.833333 13.000000 inf 9.166667 14.666667
d 4.444444 3.833333 3.055556 inf 4.833333
e 24.500000 8.000000 44.000000 43.500000 inf
this is modified distance matrix, representing pairwise distance between objects ['a','b','c','d','e'], where each row is divided by a coefficient (weight) and all diagonal elements artificially set to np.inf
.
How may I get a list/vector of indices like as follows in an efficient (vectorized) way:
d # index of minimal element in the column `a`
a # index of minimal element in the column `b` (excluding already found indices: [d])
b # index of minimal element in the column `c` (excluding already found indices: [d,a])
c # index of minimal element in the column `d` (excluding already found indices: [d,a,b])
I.e. in the first column we had found index d
, so when we search for a minimum in the second column we are excluding row with index d
(found previously in the first column) - this would be a
.
When we are looking for the minimum in the third column we are excluding rows with indices found previously (['d','a']
) - this would be b
.
When we are looking for the minimum in the fourth column we are excluding rows with indices found previously (['d','a','b']
) - this would be c
.
I don't need diagonal (inf
) elements, so the resulting list/vector will contain d.shape[0] - 1
elements.
I.e. the resulting list will look like: ['d','a','b','c']
or in case of Numpy solution the corresponding numerical indices: [3,0,1,2]
It's not a problem to do it using slow for loop
solution, but I can't wrap my head around a vectorized (fast) solution...
Upvotes: 6
Views: 132
Reputation: 164623
A loop is the only solution I can see here.
But you can use numpy
+ numba
to optimise.
from numba import jit
@jit(nopython=True)
def get_min_lookback(A, res):
for i in range(A.shape[1]):
res[i] = np.argmin(A[:, i])
A[res[i], :] = np.inf
return res
arr = df.values
get_min_lookback(arr, np.zeros(arr.shape[1], dtype=int))
# array([3, 0, 1, 2, 0])
Upvotes: 2
Reputation: 210832
Here is my solution, which I'm sure is not the best one:
resulting list:
res = []
main function, that will search for a minimum in a column, excluding previously found indices and adding found index to res
:
def f(col):
ret = col.loc[~col.index.isin(res)].idxmin()
if ret not in res:
res.append(ret)
apply function to each column:
_ = d.apply(f)
result:
In [55]: res
Out[55]: ['d', 'a', 'b', 'c', 'e']
excluding last element:
In [56]: res[:-1]
Out[56]: ['d', 'a', 'b', 'c']
Upvotes: 1