Reputation: 81
I am trying to add a column to a geodataframe in Geopandas (0.4.0) with a single values (point) from a geoseries to be used in further calculations.
However, after simply creating a new column and assigning directly the geoseries, I noticed that the new column is filled with NaN.
If I use the shapely object itself I receive the following error message: 'AssertionError: Shape of new values must be compatible with manager shape'
example below:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
# create some geometry
coordinates = {'lng': [1,2,3], 'lat': [4,5,6], 'loc': ['a', 'b', 'd']}
df = pd.DataFrame(coordinates, columns = ['loc', 'lat', 'lng'])
df['geometry'] = df.apply(
lambda x: Point((x.lat, x.lng)),
axis = 1)
# create point of interest
coordinates_center = {'lng': 2.2, 'lat': 4.8, 'loc': ['c']}
df_center = pd.DataFrame(coordinates_center)
df_center['geometry'] = df.apply(
lambda x: Point((x.lat, x.lng)),
axis = 1)
# check data type
print (type(df_center))
center = df_center['geometry']
print (type(center))
center_point = center[0]
print (type(center_point))
#create new column in main dataframe and assign the point of interest
df.assign(center=center_point)
Upvotes: 6
Views: 26137
Reputation: 161
if you are working on shapefiles with geopandas:
import geopandas as gpd
gdf = gpd.read_file(input_shp)
if 'field_name' not in gdf.columns: # check if field exists
gdf['field_name'] = None # initialize field, float, two decimals
gdf['field_name'] = gdf['field_name'].astype('float64')
gdf['field_name'] = gdf['field_name'].round(decimals=2)
# then you can acces it in iterrows
for index, row in gdf.iterrows():
gdf.at[index, 'field_name'] = 0 # assign new value to new field
# if you want to save it
gdf.to_file('path')
or with apply()
def modify_row(row):
row['field_name'] = 0
return row
modified_gdf = gdf.apply(lambda row: modify_row(row))
modified_gdf.to_file('path')
Upvotes: 0
Reputation: 68186
The magic sauce with (geo)pandas is that it automatically aligns data on the index. So it's aligning your single value series with the index of the data frame. At most there could be only one match. If you want to assign a constant value to your new column, use a scalar.
Take for instance (and not the reproducible example I've provided):
import pandas
df = pandas.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]}, index=list('abc'))
s = pandas.Series([6], index=[0])
print(df.assign(C=s))
We get:
A B C
a 0 3 NaN
b 1 4 NaN
c 2 5 NaN
This is because the index of s
and the index of df
have no matches. If there was a single match (since len(s) == 1
), you'd get:
s = pandas.Series([6], index=['b'])
print(df.assign(C=s))
A B C
a 0 3 NaN
b 1 4 6.0
c 2 5 NaN
But this isn't what you want, so you should just use a scalar:
print(df.assign(C=6))
A B C
a 0 3 6
b 1 4 6
c 2 5 6
Upvotes: 7