Absolute difference between variables in columns of different lengths

Question

EDIT::::: So I need to employ this for x and y coordinates. So DF1 has 5384167 pairs and the DF2 has 1928 pairs.

I understand how @Divakar 's solution works for one variable. In my specific case I have Subway Coordinates and I'd like to compare their proximity to Crime Coordinates. See if there is a correlation with type of crime and euclidean distance from the closest Subway Station.

I know that sqrt(DF1[x]-DF2[x])**2 + (DF1[y]-DF2[y])**2) needs to be done between all X,Y's in subway locations and each X, Y in crime locations.

Crime Coordinates:

X_COORD_CD Y_COORD_CD 0 1007314.0 241257.0 1 1043991.0 193406.0 2 999463.0 231690.0 3 1060183.0 177862.0 4 987606.0 208148.0

Subway Coordinates:

X_COORD_CD Y_COORD_CD 0 1020671.0 248680.0 1 1019420.0 245867.0 2 1017558.0 245632.0

X_Crime Y_Crime X_Subway Y_Subway Dist_sub1 Dist_sub2 Dist_sub3 1007314 241257 1020671 248680 13356.72213 12105.8096 10243.78646 1043991 193406 1019420 245867 23318.81485 24569.93244 26432.01209 999463 231690 1017558 245632 21207.59944 19956.64481 18094.61475 1060183 177862 39511.10383 40762.16584 42624.20504 987606 208148 33064.38708 31813.40719 29951.37426

so Dist_sub1[0] = ((X_Crime[0] - X_Subway[0])**2 + (Y_Crime[0] - Y_Subway[0])**2)**0.5

Dist_sub2[3] = ((X_Crime[3] - X_Subway[1])**2 + (Y_Crime[3] - Y_Subway[1])**2)**0.5

So the minimum distance to a subway for Crime[0] is 10243.78646 (Subway[2]); Crime[1] is 23318.81485 (Subway[0]); Crime[2] is 18094.61475 (Subway[2]); Crime[3] is 39511.10383 (Subway[0]); Crime[4] is 29951.37426 (Subway[2])

END OF EDIT

I have a list of x and y coordinates in two separate dataframes. One has 5384167 entries and the other has 1928 entries.

I'm trying to figure out how to calculate the absolute distance between each entry in df1 and df2, then find the minimum. Example:

df1 :

df2 :

I know creating a matrix is the way to go but I literally do not know how to get there. The matrix would look like this:

Then I'd get the minimum for each column :

3   2   1

Absolute difference between variables in columns of different lengths

Answers (1)

Related Questions