Reputation: 18929
Hi I have a legacy db with some positional data. The fields are just text fields with strings like this 0°25'30"S, 91°7'W
. Is there some way I can convert these to two floating point numbers for Decimal Latitude
and Decimal Longitude
?
EDIT:
So an example would be: 0°25'30"S, 91°7'W
-> 0.425
, 91.116667
where the original single field position yields two floats.
Any help much appreciated.
Upvotes: 11
Views: 16487
Reputation: 1
Try this one, which deals with one coordinate (latitude or longitude) at a time. It's able to return valid results for coordinates with compass directions placed at the beginning or end of the coordinate and "," as decimal separator, and returns the original string if it's not able to decode the input.
def dec(coord):
c = coord.upper()
s = 1
if c.find('S')>0 or c.find('W')>0:
s = -1
c = c.replace('N','').replace('E','').replace('S','').replace('W','').replace(',','.').replace(u'°',' ').replace('\'',' ').replace('"',' ')
a = c.split()
a.extend([0,0,0])
try:
return s*(float(a[0])+float(a[1])/60.0+float(a[2])/3600.0)
except:
return coord
Upvotes: 0
Reputation: 245
You can use the function clean_lat_long()
from the library DataPrep if your data is in a DataFrame. Install DataPrep with pip install dataprep
.
from dataprep.clean import clean_lat_long
df = pd.DataFrame({"coord": ["""0°25'30"S, 91°7'W""", """27°29'04.2"N 89°19'44.6"E"""]})
df2 = clean_lat_long(df, "coord", split=True)
# print(df2)
coord latitude longitude
0 0°25'30"S, 91°7'W -0.4250 -91.1167
1 27°29'04.2"N\t89°19'44.6"E 27.4845 89.3291
Upvotes: 1
Reputation: 35269
This approach can deal with seconds and minutes being absent, and I think handles the compass directions correctly:
# -*- coding: latin-1 -*-
def conversion(old):
direction = {'N':1, 'S':-1, 'E': 1, 'W':-1}
new = old.replace(u'°',' ').replace('\'',' ').replace('"',' ')
new = new.split()
new_dir = new.pop()
new.extend([0,0,0])
return (int(new[0])+int(new[1])/60.0+int(new[2])/3600.0) * direction[new_dir]
lat, lon = u'''0°25'30"S, 91°7'W'''.split(', ')
print conversion(lat), conversion(lon)
#Output:
0.425 91.1166666667
Upvotes: 19
Reputation: 143022
A simple approach (given that I taught myself about regular expressions just today because of this problem). Deals with missing fields and compass directions.
# -*- coding: latin-1 -*-
import re
s = """0°25'30"S, 91°7'W"""
def compLat_Long(degs, mins, secs, comp_dir):
return (degs + (mins / 60) + (secs / 3600)) * comp_dir
def extract_DegMinSec(data):
m = re.search(r'(\d+°)*(\d+\')*(\d+")*', data.strip())
deg, mins, secs = [0.0 if m.group(i) is None else float(m.group(i)[:-1]) for i in range(1, 4)]
comp_dir = 1 if data[-1] in ('W', 'S') else -1
return deg, mins, secs, comp_dir
s1, s2 = s.split(',')
dms1 = extract_DegMinSec(s1)
dms2 = extract_DegMinSec(s2)
print('{:7.4f} {:7.4f}'.format(compLat_Long(*dms1), compLat_Long(*dms2)))
yields
0.4250 91.1167
Upvotes: 1
Reputation: 17275
This converts your input string to your expected output. It can handle minutes and seconds not being present.
Currently, it does not account for North/South, East/West. If you'll tell me how you'd like those handled, I'll update the answer.
# -*- coding: latin-1 -*-
import re
PATTERN = re.compile(r"""(?P<lat_deg>\d+)° # Latitude Degrees
(?:(?P<lat_min>\d+)')? # Latitude Minutes (Optional)
(?:(?P<lat_sec>\d+)")? # Latitude Seconds (Optional)
(?P<north_south>[NS]) # North or South
,[ ]
(?P<lon_deg>\d+)° # Longitude Degrees
(?:(?P<lon_min>\d+)')? # Longitude Minutes (Optional)
(?:(?P<lon_sec>\d+)")? # Longitude Seconds (Optional)
(?P<east_west>[EW]) # East or West
""", re.VERBOSE)
LAT_FIELDS = ("lat_deg", "lat_min", "lat_sec")
LON_FIELDS = ("lon_deg", "lon_min", "lon_sec")
def parse_dms_string(s, out_type=float):
"""
Convert a string of the following form to a tuple of out_type latitude, longitude.
Example input:
0°25'30"S, 91°7'W
"""
values = PATTERN.match(s).groupdict()
return tuple(sum(out_type(values[field] or 0) / out_type(60 ** idx) for idx, field in enumerate(field_names)) for field_names in (LAT_FIELDS, LON_FIELDS))
INPUT = """0°25'30"S, 91°7'W"""
print parse_dms_string(INPUT) # Prints: (0.42500000000000004, 91.11666666666666)
Upvotes: 2