Reputation: 135
Lets say we have 2 DataFrames:
DF1
name latitude longitude
0 A 40.730610 -73.935242
1 B 42.095554 -79.238609
2 C 31.442778 -100.450279
DF2
name latitude longitude
0 AA 40.560001 -74.290001
1 BB 33.193611 -117.241112
2 CC 41.676388 -86.250275
3 DD 34.155834 -119.202789
With
from geopy.distance import geodesic, great_circle
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
print(geodesic(newport_ri, cleveland_oh).miles)
one can calculate the distance between two points.
How can I calculate the distances for A, B, C in DF1 to every item in DF2 (AA, BB, CC, DD) and safe these information in a different DF or dictionary?
Should approx look like this:
from to distance
A AA 5
A BB 2
...
C DD 16
Upvotes: 0
Views: 75
Reputation: 149065
I would first build a cartesian product of both dataframes:
resul = df1.set_index(np.zeros(len(df1), 'int')).join(df2.set_index(np.zeros(len(df2), 'int')),
lsuffix='_1', rsuffix='_2').reset_index(drop=True)
to get:
name_1 latitude_1 longitude_1 name_2 latitude_2 longitude_2
0 A 40.730610 -73.935242 AA 40.560001 -74.290001
1 A 40.730610 -73.935242 BB 33.193611 -117.241112
2 A 40.730610 -73.935242 CC 41.676388 -86.250275
3 A 40.730610 -73.935242 DD 34.155834 -119.202789
4 B 42.095554 -79.238609 AA 40.560001 -74.290001
5 B 42.095554 -79.238609 BB 33.193611 -117.241112
6 B 42.095554 -79.238609 CC 41.676388 -86.250275
7 B 42.095554 -79.238609 DD 34.155834 -119.202789
8 C 31.442778 -100.450279 AA 40.560001 -74.290001
9 C 31.442778 -100.450279 BB 33.193611 -117.241112
10 C 31.442778 -100.450279 CC 41.676388 -86.250275
11 C 31.442778 -100.450279 DD 34.155834 -119.202789
It is now easy to compute the distances:
df[distance] = df.apply(lambda x: geodesic((x['latitude_1'], x['longitude_1']),
(x['latitude_2'], x['longitude_2'])), axis=1)
You can now drop unneeded columns and/or rename them...
Upvotes: 1