Reputation: 411
I have 2 Dataframes
DF1-
Name X Y
0 Astonished 0.430 0.890
1 Excited 0.700 0.720
2 Expectant 0.320 0.067
3 Passionate 0.333 0.127
[47 rows * 3 columns]
DF2-
Id X Y
0 1 -0.288453 0.076105
1 4 -0.563453 -0.498895
2 5 -0.788453 -0.673895
3 6 -0.063453 -0.373895
4 7 0.311547 0.376105
[767 rows * 3 columns]
Now what I want to achieve is -
Take the X,Y
from first entry from DF2, iterate it over DF1, calculate Euclidean Distance between each value of X,Y
in DF2.
Find the minimum of all the Euclidean Distance obtained between the two points, save the minimum result somewhere along with the corresponding entry under the name column.
Example-
Say for any tuple of X,Y
in DF2, the minimum Euclidean distance is corresponding to the X,Y
value in the row 0 of DF1, then the result should be, the distance and name Astonished
.
My Attempt-
import pandas as pd
import numpy as np
import csv
mood = pd.read_csv("C:/Users/Desktop/DF1.csv")
song_value = pd.read_csv("C:/Users/Desktop/DF2.csv")
df_temp = mood.loc[:, ['Arousal','Valence']]
df_temp1 = song_value.loc[:, ['Arousal','Valence']]
import scipy
from scipy import spatial
ary = scipy.spatial.distance.cdist(mood.loc[:, ['Arousal','Valence']], song_value.loc[:, ['Arousal','Valence']], metric='euclidean')
print (ary)
Result Obtained -
[[1.08563344 1.70762362 1.98252253 ... 0.64569366 0.47426051 0.83656989]
[1.17967807 1.75556794 2.03922435 ... 0.59326275 0.2469077 0.79334076]
[0.60852124 1.04915517 1.33326431 ... 0.1848471 0.53293637 0.08394834]
...
[1.26151359 1.5500629 1.81168766 ... 0.74070027 0.70209658 0.75277205]
[0.69085994 1.03764923 1.31608627 ... 0.33265268 0.61928227 0.21397822]
[0.84484398 1.11428893 1.38222899 ... 0.48330291 0.69288125 0.3886008 ]]
I have no clue how I should proceed now. Please suggest something.
EDIT - 1
I converted the array in another data frame using
new_series = pd.DataFrame(ary)
print (new_series)
Result -
0 1 2 ... 764 765 766
0 1.085633 1.707624 1.982523 ... 0.645694 0.474261 0.836570
1 1.179678 1.755568 2.039224 ... 0.593263 0.246908 0.793341
2 0.608521 1.049155 1.333264 ... 0.184847 0.532936 0.083948
3 0.623534 1.093331 1.378075 ... 0.124156 0.479393 0.109057
4 0.791926 1.352785 1.636748 ... 0.197403 0.245908 0.398619
5 0.740038 1.260768 1.545785 ... 0.092072 0.304926 0.281791
6 0.923284 1.523395 1.803676 ... 0.415540 0.293217 0.611312
7 1.202447 1.679660 1.962823 ... 0.554256 0.247391 0.703298
8 0.824898 1.343684 1.628727 ... 0.177560 0.222666 0.360980
9 1.191411 1.604942 1.883150 ... 0.570771 0.395957 0.668736
10 0.822236 1.456863 1.708469 ... 0.706252 0.787271 0.823542
11 0.741683 1.371996 1.618916 ... 0.704496 0.835235 0.798964
12 0.346244 0.967891 1.240839 ... 0.376504 0.715617 0.359700
13 0.526096 1.163209 1.421820 ... 0.520190 0.748265 0.579333
14 0.435992 0.890291 1.083229 ... 0.937048 1.254437 0.884499
15 0.600338 1.162469 1.375755 ... 0.876228 1.116301 0.891714
16 0.634254 1.059083 1.226407 ... 1.088393 1.373536 1.058550
17 0.712227 1.284502 1.498187 ... 0.917272 1.117806 0.956957
18 0.194387 0.799728 1.045745 ... 0.666713 1.013563 0.597524
19 0.456000 0.708741 0.865870 ... 1.068296 1.420654 0.973234
20 0.633776 0.632060 0.709202 ... 1.277083 1.645173 1.157765
21 0.192291 0.597749 0.826602 ... 0.831713 1.204117 0.716746
22 0.522033 0.526969 0.645998 ... 1.170316 1.546040 1.041762
23 0.668148 0.504480 0.547920 ... 1.316602 1.698041 1.176933
24 0.718440 0.285718 0.280984 ... 1.334008 1.727796 1.166364
25 0.759187 0.265412 0.217165 ... 1.362786 1.757580 1.190132
26 0.598326 0.113459 0.380513 ... 1.087573 1.479296 0.896239
27 0.676841 0.263613 0.474246 ... 1.074911 1.456515 0.875707
28 0.865641 0.365394 0.462742 ... 1.239941 1.612779 1.038790
29 0.463623 0.511737 0.786284 ... 0.719525 1.099122 0.519226
30 0.780386 0.550483 0.750532 ... 0.987863 1.336760 0.788449
31 1.077559 0.711697 0.814205 ... 1.274933 1.602953 1.079529
32 1.020408 0.497152 0.522999 ... 1.372444 1.736938 1.170889
33 0.963911 0.367018 0.336035 ... 1.398444 1.778496 1.198905
34 1.092763 0.759612 0.873457 ... 1.256086 1.574565 1.063570
35 0.903631 0.810449 1.018501 ... 0.921287 1.219046 0.740134
36 0.728728 0.795942 1.045868 ... 0.695317 1.009043 0.512147
37 0.738314 0.600405 0.822742 ... 0.895225 1.239125 0.697393
38 1.206901 1.151385 1.343654 ... 1.064721 1.273002 0.922962
39 1.248530 1.293525 1.508517 ... 0.988508 1.137608 0.880669
40 0.988777 1.205968 1.463036 ... 0.622495 0.776919 0.541414
41 0.941001 1.043940 1.285215 ... 0.732293 0.960420 0.595174
42 1.242508 1.321327 1.544222 ... 0.947970 1.080069 0.851396
43 1.262534 1.399453 1.633948 ... 0.900340 0.989603 0.830024
44 1.261514 1.550063 1.811688 ... 0.740700 0.702097 0.752772
45 0.690860 1.037649 1.316086 ... 0.332653 0.619282 0.213978
46 0.844844 1.114289 1.382229 ... 0.483303 0.692881 0.388601
[47 rows x 767 columns]
Moreover, is this the best approach? Sorry, but am not sure, that's why am putting this up.
Upvotes: 1
Views: 497
Reputation: 1527
Say df_1
and df_2
are your dataframes, first extract your pairs as shown below:
pairs_1 = list(tuple(zip(df_1.X, df_1.Y)))
pairs_2 = list(tuple(zip(df_2.X, df_2.Y)))
Then iterate over pairs as per your use case and get the index of minimum distance for the iterated points:
from scipy import spatial
min_distances = []
closest_pairs = []
names = []
for i in pairs_2:
min_dist = scipy.spatial.distance.cdist([i], pairs_1, metric='euclidean').min()
index_min = scipy.spatial.distance.cdist([i], pairs_1, metric='euclidean').argmin()
min_distances.append(min_dist)
closest_pairs.append(df_1.loc[index_min, ['X', 'Y']])
names.append(df_1.loc[index_min, 'Name'])
Insert results to df_2:
df_2['min_distance'] = min_distances
df_2['closest_pairs'] = [tuple(i.values) for i in closest_pairs]
df_2['name'] = names
df_2
Output:
Id X Y min_distance closest_pairs name
0 1 -0.288453 0.076105 0.608521 (0.32, 0.067) Expectant
1 4 -0.563453 -0.498895 1.049155 (0.32, 0.067) Expectant
2 5 -0.788453 -0.673895 1.333264 (0.32, 0.067) Expectant
3 6 -0.063453 -0.373895 0.584316 (0.32, 0.067) Expectant
4 7 0.311547 0.376105 0.250027 (0.33, 0.127) Passionate
I have added min_distance
and closest_pairs
as well, you can exclude these columns if you want to.
Upvotes: 3