Ana_1960
Ana_1960

Reputation: 39

how to calculate the distances between all datapoints among each other

I want to check which data points within X are close to each other and which are far. by calculating the distances between each other without getting to zero, is it possible?

X = np.random.rand(20, 10)
dist = (X - X) ** 2
print(X)

Upvotes: 1

Views: 229

Answers (4)

Vijay Mariappan
Vijay Mariappan

Reputation: 17191

Using just numpy you can either do,

np.linalg.norm((X - X[:,None]),axis=-1)

or,

np.sqrt(np.square(X - X[:,None]).sum(-1))

Upvotes: 1

Daniel Quiroga
Daniel Quiroga

Reputation: 1

I would assume you want a way to actually get some way of keeping track of the distances, correct? If so, you can easily build a dictionary that will contain the distances as the keys and a list of tuples that correspond to the points as the value. Then you would just need to iterate through the keys in asc order to get the distances from least to greatest and the points that correspond to that distance. One way to do so would be to just brute force each possible connection between points.

dist = dict()
X = np.random.rand(20, 10)
for indexOfNumber1 in range(len(X) - 1):
   for indexOfNumber2 in range(1, len(X)):
      distance = sqrt( (X[indexOfNumber1] - X[indexOfNumber2])**2 )
      if distance not in dist.keys():
        dist[distance] = [tuple(X[indexOfNumber1], X[indexOfNumber2])]
      else:
        dist[distance] = dist[distance].append(tuple(X[indexOfNumber1], X[indexOfNumber2]))

The code above will then have a dictionary dist that contains all of the possible distances from the points you are looking at and the corresponding points that achieve that distance.

Upvotes: 0

PaulS
PaulS

Reputation: 25313

Another possible solution:

from scipy.spatial.distance import cdist

X = np.random.rand(20, 10)
cdist(X, X)

Upvotes: 1

James King
James King

Reputation: 66

You can go though each point in sequence

X = np.random.rand(20, 10)
no_points = X.shape[0]

distances = np.zeros((no_points, no_points))
for i in range(no_points):
    for j in range(no_points):
            distances[i, j] = np.linalg.norm(X[i, :] - X[j, :])

print(distances,np.max(distances))

Upvotes: 0

Related Questions