Reputation: 664
I have built a correlation matrix output from a small test set and ended up with the following. True values are those that are greater than a defined value (e.g. results = correlation_matrix > 0.75 )
[[False False False True]
[False False True False]
[False True False True]
[ True False True False]]
Note that I also falsified the diagonal (top left to bottom right). I also only need half the matrix because it's a mirror top-left / bottom-right.
Is there a way/function in Numpy (or other) for me to return the row/column of values that are True? When I use this against real data (200k rows), I need to be able to do this quickly without using an inner loop. 200k*200k checks will be very very slow. I imagine there has to be a matrix/numpy/scikit.learn etc function that will provide this but I have not been able to find one.
The expected output from this would be:
[[1, 4], [2, 3], [3, 2], [3, 4], [4, 1], [4, 3]]
Ideally, given that this is a mirror image would be:
[[1, 4], [2, 3], [3, 4]]
Upvotes: 4
Views: 167
Reputation: 221524
To get the indices with 0-based indexing, one straight-forward way would be to mask out the lower diagonal places with np.triu
and then get the indices with np.argwhere
-
np.argwhere(np.triu(a))
To mask out diagonal places as well, use np.triu(a,1)
.
Another way would be to use an explicit mask created with the help of broadcasting
-
r = np.arange(a.shape[0])
a[r[:,None] >= r] = 0 # Note that this changes input array
indices = np.argwhere(a)
Upvotes: 4