AulwTheo
AulwTheo

Reputation: 41

How do I separate lat-long coordinate strings into two columns?

I am using this dataset: https://opendataportal-lasvegas.opendata.arcgis.com/datasets/restaurant-inspections-open-data/explore

In this dataset, there is a feature 'Location_1' which involves the coordinates of locations of restaurants:

                     Location_1
0  (36.12976350, -115.31507320)
1  (36.10882500, -115.32125900)
2  (36.07411000, -115.08284000)
3  (36.07411000, -115.08284000)
4  (36.12734880, -115.14185020)

I want to create two new columns in my DataFrame, one for latitude and one for longitude.

I naively tried to generate the column vectors first with the commands

lat = df['Location_1'][:,0]
long = df['Location_1'][:,1]

but it did not work. It looks like that the entries in df['Location_1'] are strings.

For instance, I tried

In [5]: df['Location_1'][1][1:30]
Out[5]: '36.10882500, -115.32125900)'

But I want to get floats in the latitude and longitude columns.

Upvotes: 1

Views: 795

Answers (3)

LingYan Meng
LingYan Meng

Reputation: 774

I downloaded the file and with the following code, the result.csv contains two columns as you expected.

import pandas as pd
df = pd.read_csv('raw_data.csv', low_memory=False)
df[['latitude', 'longitude']] = df['Location_1'].str.extract(pat = '(-?\d+\.\d+),\s*(-?\d+\.\d+)')
df.to_csv('result.csv')

Upvotes: 1

ddejohn
ddejohn

Reputation: 8962

You can use .apply to map pd.Series to each column after processing the strings:

In [2]: df
Out[2]:
                     Location_1
0  (36.12976350, -115.31507320)
1  (36.10882500, -115.32125900)
2  (36.07411000, -115.08284000)
3  (36.07411000, -115.08284000)
4  (36.12734880, -115.14185020)

In [3]: df = df.Location_1.str.strip("()").str.split(", ")

In [4]: df = df.apply(pd.Series).astype(float).rename(columns={0: "lat", 1: "lng"})

In [5]: df
Out[5]:
         lat         lng
0  36.129764 -115.315073
1  36.108825 -115.321259
2  36.074110 -115.082840
3  36.074110 -115.082840
4  36.127349 -115.141850

Or in a single line:

df = df.Location_1.str.strip("()").str.split(", ").apply(pd.Series).astype(float)

Tack on the rename if you want: .rename(columns={0: "lat", 1: "lng"})

Upvotes: 2

Yashar Ahmadov
Yashar Ahmadov

Reputation: 1636

This also works:

df=df['Location_1'].str.split(pat=None, n=-1, expand=True)
df[0] = df[0].str[1:]
df[0] = df[0].str[:-1]
df[1] = df[1].str[:-1]

Upvotes: 1

Related Questions