culebrón
culebrón

Reputation: 36463

How to do this operation in numPy?

I have an array X of 3D coords of N points (N*3) and want to calculate the eukledian distance between each pair of points.

I can do this by iterating over X and comparing them with the threshold.

coords = array([v.xyz for v in vertices])
for vertice in vertices:
    tests = np.sum(array(coords - vertice.xyz) ** 2, 1) < threshold
    closest = [v for v, t in zip(vertices, tests) if t]

Is this possible to do in one operation? I recall linear algebra from 10 years ago and can't find a way to do this.

Probably this should be a 3D array (point a, point b, axis) and then summed by axis dimension.

edit: found the solution myself, but it doesn't work on big datasets.

    coords = array([v.xyz for v in vertices])
    big = np.repeat(array([coords]), len(coords), 0)
    big_same = np.swapaxes(big, 0, 1)
    tests = np.sum((big - big_same) ** 2, 0) < thr_square

    for v, test_vector in zip(vertices, tests):
        v.closest = self.filter(vertices, test_vector)

Upvotes: 3

Views: 334

Answers (3)

Fred Foo
Fred Foo

Reputation: 363547

Use scipy.spatial.distance. If X is an n×3 array of points, you can get an n×n distance matrix from

from scipy.spatial import distance
D = distance.squareform(distance.pdist(X))

Then, the closest to point i is the point with index

np.argsort(D[i])[1]

(The [1] skips over the value in the diagonal, which will be returned first.)

Upvotes: 2

sega_sai
sega_sai

Reputation: 8538

If xyz is the array with your coordinates, then the following code will compute the distance matrix (works fast till the moment when you have enough memory to store N^2 distances):

xyz = np.random.uniform(size=(1000,3))
distances = (sum([(xyzs[:,i][:,None]-xyzs[:,i][None,:])**2 for i in range(3)]))**.5

Upvotes: 0

Brendan Wood
Brendan Wood

Reputation: 6440

I'm not quite sure what you're asking here. If you're computing the Euclidean distance between each pair of points in an N-point space, it would make sense to me to represent the results as a look-up matrix. So for N points, you'd get an NxN symmetric matrix. Element (3, 5) would represent the distance between points 3 and 5, whereas element (2, 2) would be the distance between point 2 and itself (zero). This is how I would do it for random points:

import numpy as np

N = 5 

coords = np.array([np.random.rand(3) for _ in range(N)])
dist = np.zeros((N, N)) 

for i in range(N):
    for j in range(i, N): 
        dist[i, j] = np.linalg.norm(coords[i] - coords[j])
        dist[j, i] = dist[i, j]

print dist

Upvotes: 0

Related Questions