Reputation: 125
I would like to create an own customized k nearest neighbor method.
For this I would need a matrix (x : y) which returns the distance for each combination of x and y for a given function (e.g. euclidean based on 7 items of my dataset).
e.g.
data:
x1 x2 x3
row 1: 1 2 3
row 2: 1 1 1
row 3: 4 2 3
if I select x1 and x2 and euclidean, then the output should be a 3x3 output
1:1=0
1:2 =sqrt((1-1)^2+(2-1)^2)=1
1:3 =sqrt((1-4)^2+(2-2)^2)=sqrt(3)
2:1=1:2=1
2:2=0
2:3=sqrt((1-4)^2+(1-2)^2)=2
3:3=0
and so forth...
how to write that without iterating through the dataframe?
Thanks in advance for your support.
Upvotes: 2
Views: 4586
Reputation: 33793
You can use scipy.spatial.distance.pdist
and scipy.spatial.distance.squareform
:
from scipy.spatial.distance import pdist, squareform
dist = pdist(df[['x1', 'x2']], 'euclidean')
df_dist = pd.DataFrame(squareform(dist))
If you just want an array as your output, and not a DataFrame, just use squareform
by itself, without wrapping it in a DataFrame.
The resulting output (as a DataFrame):
0 1 2
0 0.0 1.000000 3.000000
1 1.0 0.000000 3.162278
2 3.0 3.162278 0.000000
Upvotes: 6