smooth007
smooth007

Reputation: 11

Using Pandas to calculate distance between coordinates from imported csv

I am trying to import a .csv that contains two columns of location data (lat/long), compute the distance between points, write the distance to a new column, loop the function to the next set of coordinates, and write the output data frame to a new .csv. I have the following code written and it

import pandas as pd
import numpy as np
pd.read_csv("input.csv")

def dist_from_coordinates(lat1, lon1, lat2, lon2):
R = 6371  # Earth radius in km

#conversion to radians
d_lat = np.radians(lat2-lat1)
d_lon = np.radians(lon2-lon1)

r_lat1 = np.radians(lat1)
r_lat2 = np.radians(lat2)

#haversine formula
a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

haversine = 2 * R * np.arcsin(np.sqrt(a))

return haversine

lat1 = row['lat1'] #first row of location.lat column here
lon1 = row['lon1'] #first row of location.long column here
lat2 = row['lat2'] #second row of location.lat column here
lon2 = row['lon2'] #second row of location.long column here

print(dist_from_coordinates(lat1, lon1, lat2, lon2), 'km')

df.to_csv('output.csv')

I am receiving the following error: Traceback (most recent call last): File "Test.py", line 22, in lat1 = row['lat1'] #first row of location.lat column here NameError: name 'row' is not defined

Could additional feedback be provided on how to successfully loop this formula through this data?

Upvotes: 1

Views: 6217

Answers (1)

Hari
Hari

Reputation: 740

I assume that you are using 4 columns in your input.csv which contains the value of lat1,lon1,lat2 and lon2. So, after going through the operation, the output.csv file is a separate file which contains all the previous 4 columns as well as the 5th column which is the distance. You can use a for loop to do this. The method that I am showing here reads each row and calculates the distance and append it in an empty list which is the new column "Distance" and eventually creates output.csv. Make changes anywhere necessary. Remember that this works on 4 columns csv file with multiple coordinates value. Hope that this helps you. Have a great day.

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv

Upvotes: 3

Related Questions