Reputation: 21
I have a Python dataframe like attached in the picture:
where post codes are the actual post codes and their longitude and latitude, I am trying to calculate the distance from postcode_x to postcode_y
I wrote a Python function:
def distance(lat_1,lon_1,lat_2,lon_2):
R = 6373.0
# radius of the Earth
lat1 = math.radians(lat_1)
# coordinates
lon1 = math.radians(lon_1)
lat2 = math.radians(lat_1)
lon2 = math.radians(lon_2)
dlon = lon2 - lon1
# change in coordinates
dlat = lat2 - lat1
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
# Haversine formula
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
This works fine when I call it
lat_1 =52.2296756
lon_1 =21.0122287
lat_2 = 52.406374
lon_2 = 16.9251681
distance(lat_1,lon_1,lat_2,lon_2)
Ans is 278.40645089544114
however, when I try to feed this in a new column of the DataFrame
result['distance']=distance(result['LATITUDE_x'],result['LONGITUDE_x'],result['LATITUDE_y'],result['LONGITUDE_y'])
it shows the error:
TypeError: cannot convert the series to <class 'float'>
TypeError: cannot convert the series to <class 'float'>
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-56-44558335aa06> in <module>
----> 1 result['distance']=distance(result['LATITUDE_x'].astype(np.float),result['LONGITUDE_x'].astype(np.float),result['LATITUDE_y'].astype(np.float),result['LONGITUDE_y'].astype(np.float))
2 result
<ipython-input-53-4dddb160b896> in distance(lat_1, lon_1, lat_2, lon_2)
4
5
----> 6 lat1 = math.radians(lat_1)
7 # coordinates
8
c:\python\python 3.95\lib\site-packages\pandas\core\series.py in wrapper(self)
139 if len(self) == 1:
140 return converter(self.iloc[0])
--> 141 raise TypeError(f"cannot convert the series to {converter}")
142
143 wrapper.__name__ = f"__{converter.__name__}__"
TypeError: cannot convert the series to <class 'float'>
I tried:
result['distance']=distance(result['LATITUDE_x'].astype(np.float32),result['LONGITUDE_x'].astype(np.float32),result['LATITUDE_y'].astype(np.float32),result['LONGITUDE_y'].astype(np.float32))
instead of np.float32
, I put astype(float)
all are showing same error.
Upvotes: 0
Views: 255
Reputation: 3785
The problem is that your distance()
function does not support vectorized operations, thus you can't apply it to vectors, only scalars.
In order to solve it, you have two options: apply the function row-wise (perhaps using df.apply()) or to vectorize your function by using numpy (best approach):
import numpy as np
def distance(lat_1,lon_1,lat_2,lon_2):
R = 6373.0
# radius of the Earth
lat1 = np.radians(lat_1)
# coordinates
lon1 = np.radians(lon_1)
lat2 = np.radians(lat_1)
lon2 = np.radians(lon_2)
dlon = lon2 - lon1
# change in coordinates
dlat = lat2 - lat1
a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
# Haversine formula
c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))
distance = R * c
return distance
Upvotes: 1