Reputation: 11

How can I calculate distance between points in each row of an array

I have an array like this and I have to find the distance between each points. How could I do so in python with numpy?

array([[  8139, 112607],
       [  8139, 115665],
       [  8132, 126563],
       [  8193, 113938],
       [  8193, 123714],
       [  8156, 120291],
       [  8373, 125253],
       [  8400, 131442],
       [  8400, 136354],
       [  8401, 129352],
       [  8439, 129909],
       [  8430, 135706],
       [  8430, 146359],
       [  8429, 139089],
       [  8429, 133243]])

Upvotes: 0

Answers (2)

mathfux

Reputation: 5949

Let's minimize this problem down to 4 points:

points = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714]])

In general, you need to do 2 steps:

Make an indices of pairs of points you want to take
Apply np.hypot for these pairs.

TL;DR

Making an indices of points

There are many ways of how you would like to create pairs of indices for each pair of points you'd like to take. But where do they come from? In every case it's a good idea to start building them from adjancency matrix.

Case 1

In the most common way you can start from building it like so:

adjacency = np.ones(shape=(len(points), len(points)), dtype=bool)
>>> adjacency
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]

It corresponds to indices you need to take like so:

adjacency_idx_view = np.transpose(np.nonzero(adjacency))
for n in adjacency_idx_view.reshape(len(points), len(points), 2):
>>> print(n.tolist())
[[0, 0], [1, 0], [2, 0], [3, 0]]
[[0, 1], [1, 1], [2, 1], [3, 1]]
[[0, 2], [1, 2], [2, 2], [3, 2]]
[[0, 3], [1, 3], [2, 3], [3, 3]]

And this is how you collect them:

x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2],
       [3, 3]], dtype=int64)

It could be done also manually like in @ Corralien's answer:

x = np.repeat(np.arange(len(points)), len(points))
y = np.tile(np.arange(len(points)), len(points))

Case 2

In previous case every pair of point is duplicated. There are also pairs with points duplicating. A better option is to omit this excessive data and take only pairs with index of first point being less than index of the second one:

adjacency = np.less.outer(np.arange(len(points)), np.arange(len(points)))
>>> print(adjacency)
[[False  True  True  True]
 [False False  True  True]
 [False False False  True]
 [False False False False]]
x, y = np.nonzero(adjacency)

This is not used widely. Although this lays beyond the hood of np.triu_indices. Hence, as an alternative, we could use:

x, y = np.triu_indices(len(points), 1)

And this results in:

>>> np.transpose([x, y])
array([[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]])

Case 3 You could also try omit only pairs of duplicated points and leave pairs with points being swapped. As in Case 1 it costs 2x memory and consumption time so I'll leave it for demonstration purposes only:

adjacency = ~np.identity(len(points), dtype=bool)
>>> adjacency
array([[False,  True,  True,  True],
       [ True, False,  True,  True],
       [ True,  True, False,  True],
       [ True,  True,  True, False]])
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2]], dtype=int64)

I'll leave making x and y manually (without masking) as an exercise for the others.

Apply `np.hypot`

Instead of np.sqrt(np.sum((a - b) ** 2, axis=1)) you could do np.hypot(np.transpose(a - b)). I'll take my Case 2 as my index generator:

def distance(points):
    x, y = np.triu_indices(len(points), 1)
    x_coord, y_coord = np.transpose(points[x] - points[y])
    return np.hypot(x_coord, y_coord)

>>> distance(points)
array([10898.00224812,  1727.84403231,  8049.18113848, 12625.14736548,
        2849.65296133,  9776.        ])

Upvotes: 1

Corralien

Reputation: 120559

You can use np.repeat and np.tile to create all combinations then compute the euclidean distance:

xy = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714],
               [8156, 120291], [8373, 125253], [8400, 131442], [8400, 136354],
               [8401, 129352], [8439, 129909], [8430, 135706], [8430, 146359],
               [8429, 139089], [8429, 133243]])

a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
d = np.sqrt(np.sum((a - b) ** 2, axis=1))

The output of d is (196,) which is 14 x 14.

Update

but I have to do it in a function.

def distance(xy):
    a = np.repeat(xy, len(xy), axis=0)
    b = np.tile(xy, [len(xy), 1])
    return np.sqrt(np.sum((a - b) ** 2, axis=1))

d = distance(xy)