Reputation: 85
I am trying to calculate the distance (in km) between different geolocations with latitude and longitude. I tried to use the code from this thread: Pandas Latitude-Longitude to distance between successive rows. However, I run into this error:
Does anyone know how to fix this issue?
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
5466
AttributeError: 'Series' object has no attribute 'radians'
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
<ipython-input-56-3c590360590e> in <module>
11
12 df['dist'] = haversine(df.latitude.shift(), df.longitude.shift(),
---> 13 df.loc[1:, 'latitude'], df.loc[1:, 'longitude'])
14
15
<ipython-input-56-3c590360590e> in haversine(lat1, lon1, lat2, lon2, to_radians, earth_radius)
2 def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
3 if to_radians:
----> 4 lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
5
6 a = np.sin((lat2-lat1)/2.0)**2 + \
TypeError: loop of ufunc does not support argument 0 of type Series which has no callable radians method
Here is the data frame:
>>> df_latlon
latitude longitude
0 37.405548 -122.078481
1 34.080610 -84.200785
2 37.770830 -122.395463
3 37.773792 -122.409865
4 41.441269 -96.494304
5 41.441269 -96.494304
6 41.441269 -96.494304
7 41.883784 -87.637668
8 26.140780 -80.124434
9 39.960000 -85.983660
Here is the code:
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df_latlon['dist'] = haversine(df_latlon.latitude.shift(), df_latlon.longitude.shift(),
df_latlon.loc[1:, 'latitude'], df_latlon.loc[1:, 'longitude'])
Upvotes: 0
Views: 407
Reputation: 5648
I think the issue is you want to calculate row by row, but sending the series into the function like doesn't seem to be working.
Try:
data='''
latitude longitude
0 37.405548 -122.078481
1 34.080610 -84.200785
2 37.770830 -122.395463
3 37.773792 -122.409865
4 41.441269 -96.494304
5 41.441269 -96.494304
6 41.441269 -96.494304
7 41.883784 -87.637668
8 26.140780 -80.124434
9 39.960000 -85.983660'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df[['lat2', 'lon2']] = df[['latitude', 'longitude']].shift()
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df_latlon['dist'] = df.apply(lambda x: haversine(x['lat2'], x['lon2'], x['latitude'], x['longitude']), axis=1)
latitude longitude lat2 lon2 dist
0 37.405548 -122.078481 NaN NaN NaN
1 34.080610 -84.200785 37.405548 -122.078481 3415.495909
2 37.770830 -122.395463 34.080610 -84.200785 3439.656694
3 37.773792 -122.409865 37.770830 -122.395463 1.307998
4 41.441269 -96.494304 37.773792 -122.409865 2248.480322
5 41.441269 -96.494304 41.441269 -96.494304 0.000000
6 41.441269 -96.494304 41.441269 -96.494304 0.000000
7 41.883784 -87.637668 41.441269 -96.494304 737.041395
8 26.140780 -80.124434 41.883784 -87.637668 1880.578726
9 39.960000 -85.983660 26.140780 -80.124434 1629.746292
Upvotes: 0
Reputation: 2307
You're passing in a Series to the haversine function rather than a simple number for the lat and lon attributes.
I think you can use the apply function to apply the haversine to each row in the dataframe, however, I'm not too sure what the best way is for apply to be able to get hold of the next or previous row.
So, I'd just add a couple of extra columns 'from lat' and 'from lon'. Then you will have all the data you need on each row.
# add the from lat and lon as extra columns
df_latlon['from lat'] = df_latlon['latitude'].shift(1)
df_latlon['from lon'] = df_latlon['longitude'].shift(1)
def calculate_distance(df_row):
return haversine(df_row['from lat'], df_row['from lon'], df_row['latitude'], df_row['longitude'])
# pass each row through the haversine function via the calculate_distance
df_latlon['dist'] = df_latlon.apply(calculate_distance, axis=1)
Upvotes: 1