Reputation: 772
Given a data frame:
df =
car lat lon
0 0 22.0397 3.6531
1 1 22.0367 3.5095
2 2 22.0713 3.5346
3 3 22.1249 3.5922
I have calculated the euclidean distance to get matrix:
from scipy.spatial.distance import squareform, pdist
pd.DataFrame(squareform(pdist(df.iloc[:, 1:])), columns=df1.car.unique(), index=df1.car.unique())
Now I want to get Hausdorff Distance and get the matrix.
I tried:
def hausdorff(p, q):
p = p #Need to choose row
q = q #Need to choose row
return hausdorff_distance(p, q, distance="euclidean")
distance_df = squareform(pdist(df1.values, hausdorff))
euclidean = pd.DataFrame(distance_df)
Upvotes: 1
Views: 552
Reputation: 30579
There's no need to choose rows, this does pdist
for you. It calls the user-supplied function for all row combinations. So just supply the row vectors to hausdorff
. The only caveat is that hausdorff_distance
expects two 2-dimensional arrays as input, so you need to reshape them.
def hausdorff(p, q):
p = p.reshape(-1,2)
q = q.reshape(-1,2)
return hausdorff_distance(p, q, distance="euclidean")
pd.DataFrame(squareform(pdist(df.iloc[:, 1:], hausdorff)), columns=df.car.unique(), index=df.car.unique())
Result:
0 1 2 3
0 0.000000 0.143631 0.122641 0.104728
1 0.143631 0.000000 0.042745 0.120907
2 0.122641 0.042745 0.000000 0.078681
3 0.104728 0.120907 0.078681 0.000000
pdist
. Depending on what you're trying to achieve I guess you'll need to supply arrays with more than just one row, e.g. all rows for a given car as in the following example:
import itertools as it
df1 = pd.DataFrame({'car': [0,0,1,1,2,2], 'lat': 22+pd.np.random.rand(6), 'lon': 3+pd.np.random.rand(6)})
# car lat lon
#0 0 22.426797 3.006383
#1 0 22.894152 3.558360
#2 1 22.657756 3.969983
#3 1 22.788719 3.969007
#4 2 22.025103 3.854048
#5 2 22.867389 3.760920
cars = df1.car.unique()
p = []
for c in it.combinations(cars, 2):
p.append(hausdorff_distance( df1.loc[df1.car==c[0],['lat','lon']].to_numpy(), df1.loc[df1.car==c[1],['lat','lon']].to_numpy()))
pd.DataFrame(squareform(p), columns=cars, index=cars)
Result:
0 1 2
0 0.000000 0.990892 0.917975
1 0.990892 0.000000 0.643188
2 0.917975 0.643188 0.000000
Please note however that the Hausdorff distance is a directed distance, i.e. h(x,y) != h(y,x). hausdorff_distance
computes the maximum of h(x,y) and h(y,x), so you can't populate the distance matrix from it. You can use directed_hausdorff
for correctly creating the distance matrix.
Upvotes: 2