Chaudhary
Chaudhary

Reputation: 57

Modify the code to loop over another dataset

I am using haversine_distance function to calculate distance between coordinates in a dataset to a specific coordinate. [start_lat, start_lon = 40.6976637, -74.1197643]

  def haversine_distance(lat1, lon1, lat2, lon2):
   r = 6371
   phi1 = np.radians(lat1)
   phi2 = np.radians(lat2)
   delta_phi = np.radians(lat2-lat1)
   delta_lambda = np.radians(lon2-lon1)
   a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
   res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)))
   return np.round(res, 2)
start_lat, start_lon = 40.6976637, -74.1197643
distances_km = []
for row in pandas_df.itertuples(index=False):
   distances_km.append(
       haversine_distance(start_lat, start_lon, row.lat, row.lon)
   )
pandas_df['Distance'] = distances_km
pandas_df

This successfully creates a column in my data set measuring the distance from given point like this:

enter image description here

Now I want to modify this code so that instead of using [start_lat, start_lon = 40.6976637, -74.1197643] I want to use another dataset containing cities.

enter image description here

How can I modify this existing code such that I create column for every city using its coordinates instead.

So desired output shows different columns with each city name and distance as calculated above.

Any Help is appreciated, new to python!

Cities array as requested in comments

[['Nanaimo' -123.9364 49.1642]
 ['Prince Rupert' -130.3271 54.3122]
 ['Vancouver' -123.1386 49.2636]
 ['Victoria' -123.3673 48.4275]
 ['Edmonton' -113.4909 53.5445]
 ['Winnipeg' -97.1392 49.8994]
 ['Sarnia' -82.4065 42.9746]
 ['Sarnia' -82.4065 42.9746]
 ['North York' -79.4112 43.7598]
 ['Kingston' -76.4812 44.2305]
 ['St. Catharines' -79.2333 43.1833]
 ['Thunder Bay' -89.2461 48.3822]
 ['Gaspé' -64.4833 48.8333]
 ['Cap-aux-Meules' -61.8607 47.3801]
 ['Kangiqsujuaq' -71.9667 61.6]
 ['Montreal' -73.5534 45.5091]
 ['Quebec City' -71.2074 46.8142]
 ['Rimouski' -68.524 48.4489]
 ['Sept-Îles' -66.3833 50.2167]
 ['Bathurst' -65.6497 47.6186]
 ['Charlottetown' -63.1399 46.24]
 ['Corner Brook' -57.9711 48.9411]
 ['Dartmouth' -63.5714 44.6715]
 ['Lewisporte' -55.0667 49.2333]
 ['Port Hawkesbury' -61.3642 45.6153]
 ['Saint John' -66.0628 45.2796]
 ["St. John's" -52.7072 47.5675]
 ['Sydney' -60.1947 46.1381]
 ['Yarmouth' -66.1175 43.8361]]

Upvotes: 0

Views: 54

Answers (1)

C-3PO
C-3PO

Reputation: 1213

The beauty of Python is that you can use the same code to do different things.

To consider different [start_lat, start_lon] values for every column in your data, you can use the same code that you have now. All you need to do is to define start_lat and start_lon as arrays:

# --------------------- Array Initialization ---------------------
import pandas as pd
import numpy  as np
np.random.seed(0)
pandas_df    =  pd.DataFrame(data = {'lat': np.random.rand(100),
                                     'lon': np.random.rand(100)})

start_cities =  pd.DataFrame([['Nanaimo'        , -123.9364 , 49.1642], ['Prince Rupert'   , -130.3271 , 54.3122],
                            ['Vancouver'       , -123.1386 , 49.2636], ['Victoria'        , -123.3673 , 48.4275],
                            ['Edmonton'        , -113.4909 , 53.5445], ['Winnipeg'        , -97.1392  , 49.8994],
                            ['Sarnia'          , -82.4065  , 42.9746], ['Sarnia'          , -82.4065  , 42.9746],
                            ['North York'      , -79.4112  , 43.7598], ['Kingston'        , -76.4812  , 44.2305],
                            ['St. Catharines'  , -79.2333  , 43.1833], ['Thunder Bay'     , -89.2461  , 48.3822],
                            ['Gaspé'           , -64.4833  , 48.8333], ['Cap-aux-Meules'  , -61.8607  , 47.3801],
                            ['Kangiqsujuaq'    , -71.9667  , 61.6   ], ['Montreal'        , -73.5534  , 45.5091],
                            ['Quebec City'     , -71.2074  , 46.8142], ['Rimouski'        , -68.524   , 48.4489],
                            ['Sept-Îles'       , -66.3833  , 50.2167], ['Bathurst'        , -65.6497  , 47.6186],
                            ['Charlottetown'   , -63.1399  , 46.24  ], ['Corner Brook'    , -57.9711  , 48.9411],
                            ['Dartmouth'       , -63.5714  , 44.6715], ['Lewisporte'      , -55.0667  , 49.2333],
                            ['Port Hawkesbury' , -61.3642  , 45.6153], ['Saint John'      , -66.0628  , 45.2796],
                            ["St. John's"      , -52.7072  , 47.5675], ['Sydney'          , -60.1947  , 46.1381],
                            ['Yarmouth'        , -66.1175  , 43.8361]])

start_cities.columns = 'names', 'start_lat', 'start_lon'
start_lat = start_cities.start_lat
start_lon = start_cities.start_lon

# --------------------- Same code as before (as promised) ---------------------
def haversine_distance(lat1, lon1, lat2, lon2):
    r = 6371
    phi1 = np.radians(lat1)
    phi2 = np.radians(lat2)
    delta_phi = np.radians(lat2-lat1)
    delta_lambda = np.radians(lon2-lon1)
    a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) *   np.sin(delta_lambda / 2)**2
    res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1-a)))
    return np.round(res, 2)

distances_km = []

for row in pandas_df.itertuples(index=False):
   distances_km.append(
       haversine_distance(start_lat, start_lon, row.lat, row.lon))

# --------------------- Store data ---------------------
distances_km = np.array(distances_km)
for ind, name in enumerate(start_cities.names):
    pandas_df['distance_km_' + name] = distances_km[:,ind]
# print(pandas_df.keys())
# ["lat"                        , "lon"                        ,
#  "distance_km_Nanaimo"        , "distance_km_Prince Rupert"  ,
#  "distance_km_Vancouver"      , "distance_km_Victoria"       ,
#  "distance_km_Edmonton"       , "distance_km_Winnipeg"       ,
#  "distance_km_Sarnia"         , "distance_km_North York"     ,
#  "distance_km_Kingston"       , "distance_km_St. Catharines" ,
#  "distance_km_Thunder Bay"    , "distance_km_Gaspé"          ,
#  "distance_km_Cap-aux-Meules" , "distance_km_Kangiqsujuaq"   ,
#  "distance_km_Montreal"       , "distance_km_Quebec City"    ,
#  "distance_km_Rimouski"       , "distance_km_Sept-Îles"      ,
#  "distance_km_Bathurst"       , "distance_km_Charlottetown"  ,
#  "distance_km_Corner Brook"   , "distance_km_Dartmouth"      ,
#  "distance_km_Lewisporte"     , "distance_km_Port Hawkesbury",
#  "distance_km_Saint John"     , "distance_km_St. John's"     ,
#  "distance_km_Sydney"         , "distance_km_Yarmouth"       ]

Upvotes: 0

Related Questions