metersk
metersk

Reputation: 12529

Using pd.apply with a series argument gives TypeError

I am trying to calculate the distance between two pairs of lat/long with a haversine formula. I am using a series for the last two function arguments because I am trying to calculate this for multiple coordinates that I have stored in two pandas columns. I'm getting the following error TypeError: ("'Series' object is not callable", u'occurred at index 0')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from math import radians, cos, sin, asin, sqrt

origin_lat = 51.507200
origin_lon = -0.127500

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * np.arcsin(np.sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

df['dist_from_org'] = df.apply(haversine(origin_lon, origin_lat, df['ulong'], df['ulat']), axis=1)

The series from the df look like this:

+----+---------+----------+
|    |  ulat   |  ulong   |
+----+---------+----------+
|  0 | 52.6333 | 1.30000  |
|  1 | 51.4667 | -0.35000 |
|  2 | 51.5084 | -0.12550 |
|  3 | 51.8833 | 0.56670  |
|  4 | 51.7667 | -1.38330 |
|  5 | 55.8667 | -2.10000 |
|  6 | 55.8667 | -2.10000 |
|  7 | 52.4667 | -1.91670 |
|  8 | 51.8833 | 0.90000  |
|  9 | 53.4083 | -2.14940 |
| 10 | 53.0167 | -1.73330 |
| 11 | 51.4667 | -0.35000 |
| 12 | 51.4667 | -0.35000 |
| 13 | 52.7167 | -1.36670 |
| 14 | 51.4667 | -0.35000 |
| 15 | 52.9667 | -1.16667 |
| 16 | 51.4667 | -0.35000 |
| 17 | 51.8833 | 0.56670  |
| 18 | 51.8833 | 0.56670  |
| 19 | 51.4833 | 0.08330  |
| 20 | 52.0833 | 0.58330  |
| 21 | 52.3000 | -0.70000 |
| 22 | 51.4000 | -0.05000 |
| 23 | 51.9333 | -2.10000 |
| 24 | 51.9000 | -0.43330 |
| 25 | 53.4809 | -2.23740 |
| 26 | 51.4853 | -3.18670 |
| 27 | 51.2000 | -1.48333 |
| 28 | 51.7779 | -3.21170 |
| 29 | 51.4667 | -0.35000 |
| 30 | 51.7167 | -0.28330 |
| 31 | 52.2000 | 0.11670  |
| 32 | 52.4167 | -1.55000 |
| 33 | 56.5000 | -2.96670 |
| 34 | 51.2167 | -1.05000 |
| 35 | 51.8964 | -2.07830 |
+----+---------+----------+

Am I not allowed to use a series in a pd.apply function? If so how can I apply a function row by row and assign the output to a new column?

Upvotes: 0

Views: 2422

Answers (1)

Alexander
Alexander

Reputation: 109546

You don't need to use apply when calling the function. Just use:

df['dist_from_org'] = haversine(origin_lon, origin_lat, df['ulong'], df['ulat'])

When I ran your code (using scalar values for origin_lon, origin_lat, I got TypeError: cannot convert the series to . This was caused by the assignment a = ...

I reworked the formulae to apply to series:

a = dlat.divide(2).apply(sin).pow(2) 
    + lat1.apply(cos).multiply(lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2)))

Let me know if this works for you.

if origin_lon and origin_lat are constants (as opposed to a series), then use this formula:

a = dlat.divide(2).apply(sin).pow(2) + cos(lat1) * lat2.apply(cos).multiply(dlon.divide(2).apply(sin).pow(2))

Because the parameters lon2 and lat2 are Pandas Series, dlon and dlat will both be Series objects as well. You then need to use apply on the series to apply the function to each element in the list.

Upvotes: 2

Related Questions