Aukru
Aukru

Reputation: 135

Calculate the distance between two points for every item in DF multiple times

Lets say we have 2 DataFrames:

DF1

     name  latitude   longitude
0    A     40.730610  -73.935242
1    B     42.095554  -79.238609
2    C     31.442778  -100.450279

DF2

     name  latitude   longitude
0    AA     40.560001  -74.290001
1    BB     33.193611  -117.241112
2    CC     41.676388  -86.250275
3    DD     34.155834  -119.202789

With

from geopy.distance import geodesic, great_circle
newport_ri = (41.49008, -71.312796)
cleveland_oh = (41.499498, -81.695391)
print(geodesic(newport_ri, cleveland_oh).miles)

one can calculate the distance between two points.
How can I calculate the distances for A, B, C in DF1 to every item in DF2 (AA, BB, CC, DD) and safe these information in a different DF or dictionary?

Should approx look like this:

   from   to   distance
   A      AA   5
   A      BB   2
   ...
   C      DD   16

Upvotes: 0

Views: 75

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149065

I would first build a cartesian product of both dataframes:

resul = df1.set_index(np.zeros(len(df1), 'int')).join(df2.set_index(np.zeros(len(df2), 'int')),
                                                lsuffix='_1', rsuffix='_2').reset_index(drop=True)

to get:

   name_1  latitude_1  longitude_1 name_2  latitude_2  longitude_2
0       A   40.730610   -73.935242     AA   40.560001   -74.290001
1       A   40.730610   -73.935242     BB   33.193611  -117.241112
2       A   40.730610   -73.935242     CC   41.676388   -86.250275
3       A   40.730610   -73.935242     DD   34.155834  -119.202789
4       B   42.095554   -79.238609     AA   40.560001   -74.290001
5       B   42.095554   -79.238609     BB   33.193611  -117.241112
6       B   42.095554   -79.238609     CC   41.676388   -86.250275
7       B   42.095554   -79.238609     DD   34.155834  -119.202789
8       C   31.442778  -100.450279     AA   40.560001   -74.290001
9       C   31.442778  -100.450279     BB   33.193611  -117.241112
10      C   31.442778  -100.450279     CC   41.676388   -86.250275
11      C   31.442778  -100.450279     DD   34.155834  -119.202789

It is now easy to compute the distances:

df[distance] = df.apply(lambda x: geodesic((x['latitude_1'], x['longitude_1']),
                                           (x['latitude_2'], x['longitude_2'])), axis=1)

You can now drop unneeded columns and/or rename them...

Upvotes: 1

Related Questions