tj judge
tj judge

Reputation: 616

Calculate distance of successive row AND group by column

Working with the following formula and resulting dataframe:

    df['dist'] = haversine(df.LAT.shift(), df.LONG.shift(),df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])

The haversine function is defined here: https://stackoverflow.com/a/40453439/15492238

Group       ID      LAT       LONG         dist
   1         1  74.166061  30.512811          NaN
   1         2  72.249672  33.427724   232.549785
   1         3  67.499828  37.937264   554.905446
   1         4  84.253715  69.328767  1981.896491
   2         5  72.104828  33.823462  1513.397997
   2         6  63.989462  51.918173  1164.481327
   2         7  80.209112  33.530778  1887.256899
   2         8  68.954132  35.981256  1252.531365
   2         9  83.378214  40.619652  1606.340727
   2        10  68.778571   6.607066  1793.921854

I want to rewrite the same formula but group them by them group column.

Expected output:

 Group       ID      LAT       LONG         dist
   1         1  74.166061  30.512811          NaN
   1         2  72.249672  33.427724   232.549785
   1         3  67.499828  37.937264   554.905446
   1         4  84.253715  69.328767  1981.896491
   2         5  72.104828  33.823462          NaN
   2         6  63.989462  51.918173  1164.481327
   2         7  80.209112  33.530778  1887.256899
   2         8  68.954132  35.981256  1252.531365
   2         9  83.378214  40.619652  1606.340727
   2        10  68.778571   6.607066  1793.921854

Upvotes: 1

Views: 250

Answers (1)

tlentali
tlentali

Reputation: 3455

Your function has been slightly changed to return a DataFrame, then a groupby and an apply can do the job :

>>> def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
...     if to_radians:
...         lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
...     a = np.sin((lat2-lat1)/2.0)**2+ np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
...     return pd.DataFrame(earth_radius *2 * np.arcsin(np.sqrt(a)))

>>> df['dist'] = (df.groupby(["Group"])
...                 .apply(lambda x: haversine(x['LAT'],
...                                            x['LONG'], 
...                                            x['LAT'].shift(),
...                                            x['LONG'].shift())).values)
>>> df
Group   ID  LAT         LONG        dist
0   1   1   74.166061   30.512811   NaN
1   1   2   72.249672   33.427724   232.695882
2   1   3   67.499828   37.937264   555.254059
3   1   4   84.253715   69.328767   1983.141596
4   2   5   72.104828   33.823462   NaN
5   2   6   63.989462   51.918173   1165.212900
6   2   7   80.209112   33.530778   1888.442548
7   2   8   68.954132   35.981256   1253.318254
8   2   9   83.378214   40.619652   1607.349894
9   2   0   68.778571   6.607066    1795.048866

Upvotes: 1

Related Questions