frank
frank

Reputation: 3608

create distance matrix using own calculation pandas

I have a dataframe composed of >50 dimensions.

Uusing euclidean distance, I can calculate the distance matrix:

import pandas as pd
df2=pd.DataFrame({'col1':[1,2,3,4],'col2':[5,6,7,8]})
df2
from scipy.spatial import distance_matrix
dm=pd.DataFrame(distance_matrix(df2.values, df2.values), index=df2.index, columns=df2.index)
dm

I want to put more emphasis on col1, so would like to use the formula:

sqrt(w1(x1-x2)^2 + w2(y1-y2)^2), w1=0.7, w2=0.3

reading through the documentation, I cannot find a way to implement this change. I am still relatively new to python, so wonder how I can implement this using pandas

is this possible to create?

Upvotes: 0

Views: 267

Answers (1)

Stef
Stef

Reputation: 30639

You can use pdist and supply your own metrics formula:

w = (0.7, 0.3)
pd.DataFrame(squareform(pdist(df2.values, lambda u, v: np.sqrt((w*(u-v)**2).sum()))), index=df2.index, columns=df2.index)

Upvotes: 1

Related Questions