Reputation: 12529
I am trying to calculate the distance between two pairs of lat/long with a haversine formula. I am using a series for the last two function arguments because I am trying to calculate this for multiple coordinates that I have stored in two pandas columns. I'm getting the following error TypeError: ("'Series' object is not callable", u'occurred at index 0')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt
origin_lat = 51.507200
origin_lon = -0.127500
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
df['dist_from_org'] = df.apply(haversine(origin_lon, origin_lat, df['ulong'], df['ulat']), axis=1)
The series from the df look like this:
+----+---------+----------+
| | ulat | ulong |
+----+---------+----------+
| 0 | 52.6333 | 1.30000 |
| 1 | 51.4667 | -0.35000 |
| 2 | 51.5084 | -0.12550 |
| 3 | 51.8833 | 0.56670 |
| 4 | 51.7667 | -1.38330 |
| 5 | 55.8667 | -2.10000 |
| 6 | 55.8667 | -2.10000 |
| 7 | 52.4667 | -1.91670 |
| 8 | 51.8833 | 0.90000 |
| 9 | 53.4083 | -2.14940 |
| 10 | 53.0167 | -1.73330 |
| 11 | 51.4667 | -0.35000 |
| 12 | 51.4667 | -0.35000 |
| 13 | 52.7167 | -1.36670 |
| 14 | 51.4667 | -0.35000 |
| 15 | 52.9667 | -1.16667 |
| 16 | 51.4667 | -0.35000 |
| 17 | 51.8833 | 0.56670 |
| 18 | 51.8833 | 0.56670 |
| 19 | 51.4833 | 0.08330 |
| 20 | 52.0833 | 0.58330 |
| 21 | 52.3000 | -0.70000 |
| 22 | 51.4000 | -0.05000 |
| 23 | 51.9333 | -2.10000 |
| 24 | 51.9000 | -0.43330 |
| 25 | 53.4809 | -2.23740 |
| 26 | 51.4853 | -3.18670 |
| 27 | 51.2000 | -1.48333 |
| 28 | 51.7779 | -3.21170 |
| 29 | 51.4667 | -0.35000 |
| 30 | 51.7167 | -0.28330 |
| 31 | 52.2000 | 0.11670 |
| 32 | 52.4167 | -1.55000 |
| 33 | 56.5000 | -2.96670 |
| 34 | 51.2167 | -1.05000 |
| 35 | 51.8964 | -2.07830 |
+----+---------+----------+
Am I not allowed to use a series in a pd.apply function? If so how can I apply a function row by row and assign the output to a new column?
Upvotes: 0
Views: 2422
Reputation: 109546
You don't need to use apply when calling the function. Just use:
df['dist_from_org'] = haversine(origin_lon, origin_lat, df['ulong'], df['ulat'])
When I ran your code (using scalar values for origin_lon, origin_lat, I got TypeError: cannot convert the series to . This was caused by the assignment a = ...
I reworked the formulae to apply to series:
a = dlat.divide(2).apply(sin).pow(2)
+ lat1.apply(cos).multiply(lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2)))
Let me know if this works for you.
if origin_lon and origin_lat are constants (as opposed to a series), then use this formula:
a = dlat.divide(2).apply(sin).pow(2) + cos(lat1) * lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2))
Because the parameters lon2 and lat2 are Pandas Series, dlon and dlat will both be Series objects as well. You then need to use apply on the series to apply the function to each element in the list.
Upvotes: 2