Accessing specific pairwise distances in a distance matrix (scipy / numpy)

Question

I am using scipy and its cdist function to compute a distance matrix from an array of vectors.

import numpy as np
from scipy.spatial import distance


vectorList = [(0, 10), (4, 8), (9.0, 11.0), (14, 14), (16, 19), (25.5, 17.5), (35, 16)]

#Convert to numpy array
arr = np.array(vectorList)

#Computes distances matrix and set self-comparisons to NaN
d = distance.cdist(arr, arr)
np.fill_diagonal(d, None)

Let's say I want to return all the distances that are below a specific threshold (6 for example)

#Find pairs of vectors whose separation distance is < 6
id1, id2 = np.nonzero(d<6)

#id1 --> array([0, 1, 1, 2, 2, 3, 3, 4]) 
#id2 --> array([1, 0, 2, 1, 3, 2, 4, 3])

I now have 2 arrays of indices.

Question: how can I return the distances between these pairs of vectors as an array / list ?

4.47213595499958  #d[0][1]
4.47213595499958  #d[1][0]
5.830951894845301 #d[1][2]
5.830951894845301 #d[2][1]
5.830951894845301 #d[2][2]
5.830951894845301 #d[3][2]
5.385164807134504 #d[3][4]
5.385164807134504 #d[4][3]

d[id1][id2] returns a matrix, not a list, and the only way I found so far is to iterate over the distance matrix again which doesn't make sense.

np.array([d[i1][i2] for i1, i2 in zip(id1, id2)])

Aleš Erjavec · Accepted Answer

Use

d[id1, id2]

This is the form that numpy.nonzero example shows (i.e. a[np.nonzero(a > 3)]) which is different from the d[id1][id2] you are using.

See arrays.indexing for more details on numpy indexing.

Accessing specific pairwise distances in a distance matrix (scipy / numpy)

Answers (1)

Related Questions