Reputation: 4353
I have an array:
test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
[2.6, 6.4, 5.2, 6.2],
[7.2, 6.2, 2.5, 1.7],
[8.2, 7.6, 4.2, 7.3] ]
Is it possible to obtain a pandas dataframe of the form:
row_id | row1 | row2 | row3 | row4
row1 0.0 d(row1,row2) d(row1,row3) d(row1,row4)
row2 ... 0.0 ... ...
row3 ... ... 0.0 ...
row4 ... ... 0.0 ...
where d(row1, row2)
is the Euclidean distance between row1
and row2.
What I am trying now is first generating a list of all pairs of rows, then computing the distance and assigning each element to the dataframe. Is there a better/faster way of doing this?
Upvotes: 0
Views: 310
Reputation: 17156
Using cdist to compute pairwise distances
Place 2D resulting array into Pandas DataFrame
import numpy as np
from scipy.spatial.distance import cdist
import pandas as pd
test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
[2.6, 6.4, 5.2, 6.2],
[7.2, 6.2, 2.5, 1.7],
[8.2, 7.6, 4.2, 7.3] ])
# Use cdist to compute pairwise distances
dist = cdist(test_arr, test_arr)
# Place into Pandas DataFrame
# index and names of columns
names = ['row' + str(i) for i in range(1, dist.shape[0]+1)]
df = pd.DataFrame(dist, columns = names, index = names)
print(df)
Output
Pandas DataFrame
row1 row2 row3 row4
row1 0.000000 5.634714 7.790379 9.523655
row2 5.634714 0.000000 6.981404 5.916925
row3 7.790379 6.981404 0.000000 6.100000
row4 9.523655 5.916925 6.100000 0.000000
Upvotes: 1
Reputation: 30579
from scipy import spatial
import numpy as np
test_arr = np.array([ [1.2, 2.1, 2.3, 4.5],
[2.6, 6.4, 5.2, 6.2],
[7.2, 6.2, 2.5, 1.7],
[8.2, 7.6, 4.2, 7.3] ])
dist = spatial.distance.pdist(test_arr)
spatial.distance.squareform(dist)
Result:
array([[0. , 5.63471383, 7.79037868, 9.52365476],
[5.63471383, 0. , 6.98140387, 5.91692488],
[7.79037868, 6.98140387, 0. , 6.1 ],
[9.52365476, 5.91692488, 6.1 , 0. ]])
Upvotes: 2
Reputation: 1766
from sklearn.metrics.pairwise import euclidean_distances
pd.DataFrame(euclidean_distances(test_arr, test_arr))
0 1 2 3
0 0.000000 5.634714 7.790379 9.523655
1 5.634714 0.000000 6.981404 5.916925
2 7.790379 6.981404 0.000000 6.100000
3 9.523655 5.916925 6.100000 0.000000
Upvotes: 2