Reputation: 592
I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'id': [1,2,3,4],
'point':['(-23.481146, -46.3621371, 0.0)',
'(-22.9441554, -45.4070251, 0.0)',
'(-23.2249286, -45.8056902, 0.0)',
'(-23.23, -45.81)']})
print(df)
id point
1 (-23.481146, -46.3621371, 0.0)
2 (-22.9441554, -45.4070251, 0.0)
3 (-23.2249286, -45.8056902, 0.0)
4 (-23.23, -45.81)
I would like to create two new columns in the dataframe. One to store latitude and one to store longitude, starting from the 'point' column. So I made the following code:
# New columns
cols = ['lat', 'long']
df[cols] = df['point'].apply(lambda p: pd.Series([p[0], p[1]], index=cols))
However, the code is wrong. The output is:
id point lat long
1 (-23.481146, -46.3621371, 0.0) ( -
2 (-22.9441554, -45.4070251, 0.0) ( -
3 (-23.2249286, -45.8056902, 0.0) ( -
4 (-23.23, -45.81) ( -
I would like the output to be:
id point lat long
1 (-23.481146, -46.3621371, 0.0) -23.481146 -46.3621371
2 (-22.9441554, -45.4070251, 0.0) -22.9441554 -45.4070251
3 (-23.2249286, -45.8056902, 0.0) -23.2249286 -45.8056902
4 (-23.23, -45.81) -23.23 -45.81
Upvotes: 1
Views: 985
Reputation: 120409
Another solution with only Pandas:
data = df['point'].str.strip('()').str.split(',').str[:2]
geom = pd.DataFrame(data.tolist(), columns=['lat', 'long'], index=df.index) \
.apply(pd.to_numeric, errors='coerce')
df = df.join(geom)
Output:
>>> df
id point lat long
0 1 (-23.481146, -46.3621371, 0.0) -23.481146 -46.362137
1 2 (-22.9441554, -45.4070251, 0.0) -22.944155 -45.407025
2 3 (-23.2249286, -45.8056902, 0.0) -23.224929 -45.805690
3 4 (-23.23, -45.81) -23.230000 -45.810000
Upvotes: 1
Reputation:
First you need to convert the strings to actual tuples:
import ast
df['point'] = df['point'].apply(ast.literal_eval)
Then you can extract the data. I'll note an even easier way of doing it than what you were trying originally:
df['lat'] = df['point'].str[0]
df['long'] = df['point'].str[1]
Output:
>>> df
id point lat long
0 1 (-23.481146, -46.3621371, 0.0) -23.481146 -46.362137
1 2 (-22.9441554, -45.4070251, 0.0) -22.944155 -45.407025
2 3 (-23.2249286, -45.8056902, 0.0) -23.224929 -45.805690
3 4 (-23.23, -45.81) -23.230000 -45.810000
You can also use this one line version:
df = pd.concat([df, pd.DataFrame.from_records(df['point'].apply(ast.literal_eval).str[:2], columns=['lat', 'long'])], axis=1)
Upvotes: 2
Reputation:
You can use ast.literal_eval
in a list comprehension to parse the strings:
import ast
df[['lat', 'long']] = [ast.literal_eval(x)[:2] for x in df['point']]
df = df.drop(columns='point')
Output:
id lat long
0 1 -23.481146 -46.362137
1 2 -22.944155 -45.407025
2 3 -23.224929 -45.805690
3 4 -23.230000 -45.810000
Upvotes: 2