gamzef
gamzef

Reputation: 37

Haversine Function using Pandas Data Frame

I am new to Python. I am trying to calculate Haversine on a Panda Dataframe. I have 2 dataframes. Like this: First 3 rows of first dataframe

Second one: First 3 rows of second dataframe

Here is my haversine function.

    from math import radians, cos, sin, asin, sqrt

    def haversine(lon1, lat1, lon2, lat2):
      # convert decimal degrees to radians 
      lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

      # haversine formula 
      dlon = lon2 - lon1 
      dlat = lat2 - lat1 
      a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
      c = 2 * asin(sqrt(a)) 
      r = 3956 # Radius of earth in kilometers.
      return c * r

I took the longitude and latitude values in the first dataframe as centers and drew circles on the map (I took the Radius as 1000m). First, I try to give all the lon and lat values in the second dataframe to the haversine function with the lon and lat values in the first row in the first dataframe. Then I'll do the same for the other rows in the first dataframe. Thus, I will be able to find out whether the coordinates (longitude and latitude values) in the second dataframe are located in circles with central longitude and latitude values in the first dataframe. It works when i use like this:

a = haversine(29.023165,40.992752,28.844604,41.113586)
radius = 1.00 # in kilometer
if a <= radius:
    print('Inside the area')
else:
    print('Outside the area')

In the codes I wrote, I could not give the exact order I wanted. I mean I tried my code by giving all the lon and lat values ​​in the first dataframe and the second dataframe, but logically this is wrong (or unnecessary operation). I tried the below code (I tried the code Haversine Distance Calc using Pandas Data Frame "cannot convert the series to <class 'float'>") But it gives an error: ('LONGITUDE', 'occurred at index 0').

from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 3956 # Radius of earth in kilometers.
    return c * r

iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(iskeleler['lon'], iskeleler['lat'], row['LONGITUDE'], row['LATITUDE']), axis=1)

Can you help me with how I can do this? Thanks in advance.

Upvotes: 0

Views: 1840

Answers (1)

jcaliz
jcaliz

Reputation: 4021

The code you are using to calculate haversine distance receives one float in each argument, so indeed you need to pass floats for each argument. In this case iskeleler['lon'] and iskeleler['lat'] are Series.

This should work to calculate the distance between coordinates in the same row:

iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(
    row['lon'], row['lat'],
    row['LONGITUDE'], row['LATITUDE']
),axis=1)

But you are looking for a pair-wise distance which might require a for loop and this is not efficient. Try sklearn.metrics.pairwise.haversine_distances

from sklearn.metrics.pairwise import haversine_distances

distance_matrix = haversine_distances(
    iskeleler[['lat', 'lon']],
    iskeleler[['LATITUDE', 'LONGITUDE']]
)

If you prefer the table structure, then:

distance_table = pd.DataFrame(
    distance_matrix,
    index=pd.MultiIndex.from_frames(iskeleler[['lat', 'lon']]),
    columns=pd.MultiIndex.from_frames(iskeleler[['LATITUDE', 'LONGITUDE']]),
).stack([0, 1]).reset_index(name='distance')

This is an example, there are many ways to create the dataframe from the matrix.

Upvotes: 1

Related Questions