Jane Borges
Jane Borges

Reputation: 592

How to separate a string in latitude and longitude in a column of a pandas dataframe?

I have the following dataframe:

      import pandas as pd

      df = pd.DataFrame({'id': [1,2,3,4],                   
                         'point':['(-23.481146, -46.3621371, 0.0)',
                                  '(-22.9441554, -45.4070251, 0.0)',
                                  '(-23.2249286, -45.8056902, 0.0)',
                                  '(-23.23, -45.81)']})

      print(df)

      id            point
       1    (-23.481146, -46.3621371, 0.0)
       2    (-22.9441554, -45.4070251, 0.0)
       3    (-23.2249286, -45.8056902, 0.0)
       4    (-23.23, -45.81)

I would like to create two new columns in the dataframe. One to store latitude and one to store longitude, starting from the 'point' column. So I made the following code:

      # New columns
      cols = ['lat', 'long']

      df[cols] = df['point'].apply(lambda p: pd.Series([p[0], p[1]], index=cols))

However, the code is wrong. The output is:

      id    point                             lat   long
       1    (-23.481146, -46.3621371, 0.0)      (   -
       2    (-22.9441554, -45.4070251, 0.0)     (   -
       3    (-23.2249286, -45.8056902, 0.0)     (   -
       4    (-23.23, -45.81)                    (   -

I would like the output to be:

      id    point                                 lat          long
       1    (-23.481146, -46.3621371, 0.0)     -23.481146     -46.3621371
       2    (-22.9441554, -45.4070251, 0.0)    -22.9441554    -45.4070251
       3    (-23.2249286, -45.8056902, 0.0)    -23.2249286    -45.8056902
       4    (-23.23, -45.81)                   -23.23         -45.81

Upvotes: 1

Views: 985

Answers (3)

Corralien
Corralien

Reputation: 120409

Another solution with only Pandas:

data = df['point'].str.strip('()').str.split(',').str[:2]
geom = pd.DataFrame(data.tolist(), columns=['lat', 'long'], index=df.index) \
         .apply(pd.to_numeric, errors='coerce')
df = df.join(geom)

Output:

>>> df
   id                            point        lat       long
0   1   (-23.481146, -46.3621371, 0.0) -23.481146 -46.362137
1   2  (-22.9441554, -45.4070251, 0.0) -22.944155 -45.407025
2   3  (-23.2249286, -45.8056902, 0.0) -23.224929 -45.805690
3   4                 (-23.23, -45.81) -23.230000 -45.810000

Upvotes: 1

user17242583
user17242583

Reputation:

First you need to convert the strings to actual tuples:

import ast
df['point'] = df['point'].apply(ast.literal_eval)

Then you can extract the data. I'll note an even easier way of doing it than what you were trying originally:

df['lat'] = df['point'].str[0]
df['long'] = df['point'].str[1]

Output:

>>> df
   id                            point        lat       long
0   1   (-23.481146, -46.3621371, 0.0) -23.481146 -46.362137
1   2  (-22.9441554, -45.4070251, 0.0) -22.944155 -45.407025
2   3  (-23.2249286, -45.8056902, 0.0) -23.224929 -45.805690
3   4                 (-23.23, -45.81) -23.230000 -45.810000

You can also use this one line version:

df = pd.concat([df, pd.DataFrame.from_records(df['point'].apply(ast.literal_eval).str[:2], columns=['lat', 'long'])], axis=1)

Upvotes: 2

user7864386
user7864386

Reputation:

You can use ast.literal_eval in a list comprehension to parse the strings:

import ast
df[['lat', 'long']] = [ast.literal_eval(x)[:2] for x in df['point']]
df = df.drop(columns='point')

Output:

   id        lat       long
0   1 -23.481146 -46.362137
1   2 -22.944155 -45.407025
2   3 -23.224929 -45.805690
3   4 -23.230000 -45.810000

Upvotes: 2

Related Questions