Reputation: 7879
I have a numpy cosine similarity matrix. I want to find the indices of the n largest values, but exclude the 1.0
's on the diagonals and only for the lower triangular region in it.
similarities = [[ 1. 0.18898224 0.16903085]
[ 0.18898224 1. 0.67082039]
[ 0.16903085 0.67082039 1. ]]
In this case, if I want the two highest values, I would want it to return [1, 0]
and [2, 1]
.
I have tried using argpartition
but this does not return what I'm looking for
n_select = 1
most_similar = (-similarities).argpartition(n_select, axis=None)[:n_select]
How can I get the n highest values excluding the diagonal 1's and also exclude the upper triangular elements?
Upvotes: 2
Views: 805
Reputation: 10119
Remember that the diagonal elements of a square matrix have a unique property: i+j=n, where n is matrix dimension. Then, you can just find the n + number of (diagonal elements) max elements of the array and then iterate through them and exclude the tuples (i,j) where i+j=n. Hope it helps!
Upvotes: 0
Reputation: 221534
Approach #1
One approach with np.tril_indices
-
def n_largest_indices_tril(a, n=2):
m = a.shape[0]
r,c = np.tril_indices(m,-1)
idx = a[r,c].argpartition(-n)[-n:]
return zip(r[idx], c[idx])
Sample run -
In [39]: a
Out[39]:
array([[ 1. , 0.4 , 0.59, 0.15, 0.29],
[ 0.4 , 1. , 0.03, 0.57, 0.57],
[ 0.59, 0.03, 1. , 0.9 , 0.52],
[ 0.15, 0.57, 0.9 , 1. , 0.37],
[ 0.29, 0.57, 0.52, 0.37, 1. ]])
In [40]: n_largest_indices_tril(a, n=2)
Out[40]: [(2, 0), (3, 2)]
In [41]: n_largest_indices_tril(a, n=3)
Out[41]: [(4, 1), (2, 0), (3, 2)]
Approach #2
For performance, we might want to avoid generating all lower triangular indices and instead work with a mask, giving us a second method to solve our case, like so -
def n_largest_indices_tril_v2(a, n=2):
m = a.shape[0]
r = np.arange(m)
mask = r[:,None] > r
idx = a[mask].argpartition(-n)[-n:]
clens = np.arange(m).cumsum()
grp_start = clens[:-1]
grp_stop = clens[1:]-1
rows = np.searchsorted(grp_stop, idx)+1
cols = idx - grp_start[rows-1]
return zip(rows, cols)
In [143]: # Setup symmetric array
...: N = 1000
...: a = np.random.rand(N,N)*0.9
...: np.fill_diagonal(a,1)
...: m = a.shape[0]
...: r,c = np.tril_indices(m,-1)
...: a[r,c] = a[c,r]
In [144]: %timeit n_largest_indices_tril(a, n=2)
100 loops, best of 3: 12.5 ms per loop
In [145]: %timeit n_largest_indices_tril_v2(a, n=2)
100 loops, best of 3: 7.85 ms per loop
For n
smallest indices
For getting n
smallest ones, simply use ndarray.argpartition(n)[:n]
instead for both of the approaches.
Upvotes: 4