sainiak
sainiak

Reputation: 99

python- Efficiently converting latitude from ddmm.ssss to degrees

I am converting a text file into netCDF format. I am reading the data from text file into a dataframe in which two of the columns are latitude_GPS and longitude_GPS. Input looks like:

latitude_GPS, longitude_GPS
7537.4536, 3558.4985
7672.1534, 3214.9532

They are measured in ddmm.ssss units which means if we have value like 7537.4536, here '75' is degrees, '37' is minutes and '4536' is seconds. I want to convert them into degree decimal except for the missing values which have a value of 999.0
My current code looks like this:

header_rows = 1

df = pd.read_csv(args.input_file, delim_whitespace=True, skiprows=header_rows, skip_blank_lines=True, names = column_names)

num_rows = sum(1 for line in open(args.input_file) if len(line.strip()) != 0) - header_rows

def lat_lon_gps(col_index):
    return ((int(col_index)/100) + round((int(col_index%100))/60, 4) + round(round(col_index%1, 4)/3600, 4))

check_na = 999.0

i = 0
while i < num_rows:
    if df['latitude_GPS'][i] != check_na:
        df['latitude_GPS'][i] = lat_lon_gps(df['latitude_GPS'][i])

    if df['longitude_GPS'][i] != check_na:
        df['longitude_GPS'][i] = lat_lon_gps(df['longitude_GPS'][i])

The return part calculates (75 + 37/60 + 4536/3600). The above code returns what I want but it takes around 50 minutes to run this part of code for a file having 10000 rows. Is there a faster way to do it. Any thoughts would be appreciated.

Upvotes: 1

Views: 223

Answers (1)

spadarian
spadarian

Reputation: 1624

The problem is that you are iterating over every row. You should take advantage of vectorisation provided by pandas and numpy.

For example:

import numpy as np
import pandas as pd

df = pd.read_csv(args.input_file,
                 names=['latitude_GPS','longitude_GPS'],
                 skiprows=1)
check_na = 999.0

def lat_lon_gps(coords):
    deg = np.floor(coords / 100)
    minutes = np.floor(((coords / 100) - deg) * 100)
    seconds = (((coords / 100) - deg) * 100 - minutes) * 100
    return deg + minutes / 60 + seconds / 3600

# Exclude NAs
logic = df.latitude_GPS != check_na
df = df[logic]

df.latitude_GPS = lat_lon_gps(df.latitude_GPS)
df.longitude_GPS = lat_lon_gps(df.longitude_GPS)

Upvotes: 2

Related Questions