Reputation: 81

Is there a way to add a column to a geopandas dataframe using a single value geoseries?

I am trying to add a column to a geodataframe in Geopandas (0.4.0) with a single values (point) from a geoseries to be used in further calculations.

However, after simply creating a new column and assigning directly the geoseries, I noticed that the new column is filled with NaN.

If I use the shapely object itself I receive the following error message: 'AssertionError: Shape of new values must be compatible with manager shape'

example below:

import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point

# create some geometry
coordinates = {'lng': [1,2,3], 'lat': [4,5,6], 'loc': ['a', 'b', 'd']}
df = pd.DataFrame(coordinates, columns = ['loc', 'lat', 'lng'])


df['geometry'] = df.apply(
    lambda x: Point((x.lat, x.lng)), 
    axis = 1)

# create point of interest
coordinates_center = {'lng': 2.2, 'lat': 4.8, 'loc': ['c']}
df_center = pd.DataFrame(coordinates_center)

df_center['geometry'] = df.apply(
    lambda x: Point((x.lat, x.lng)), 
    axis = 1)

# check data type
print (type(df_center))
center = df_center['geometry']
print (type(center))
center_point = center[0]
print (type(center_point))

#create new column in main dataframe and assign the point of interest
df.assign(center=center_point)

Upvotes: 6

Answers (2)

Stanislaw Bidowaniec

Reputation: 161

if you are working on shapefiles with geopandas:

import geopandas as gpd


gdf = gpd.read_file(input_shp)
if 'field_name' not in gdf.columns:  # check if field exists
    gdf['field_name'] = None  # initialize field, float, two decimals
    gdf['field_name'] = gdf['field_name'].astype('float64')
    gdf['field_name'] = gdf['field_name'].round(decimals=2)

# then you can acces it in iterrows
for index, row in gdf.iterrows():
    gdf.at[index, 'field_name'] = 0  # assign new value to new field

# if you want to save it
gdf.to_file('path')

or with apply()

def modify_row(row):
    row['field_name'] = 0
    return row


modified_gdf = gdf.apply(lambda row: modify_row(row))
modified_gdf.to_file('path')

Upvotes: 0

Paul H

Reputation: 68186

The magic sauce with (geo)pandas is that it automatically aligns data on the index. So it's aligning your single value series with the index of the data frame. At most there could be only one match. If you want to assign a constant value to your new column, use a scalar.

Take for instance (and not the reproducible example I've provided):

import pandas

df = pandas.DataFrame({'A': [0, 1, 2], 'B': [3, 4, 5]}, index=list('abc'))
s = pandas.Series([6], index=[0])

print(df.assign(C=s))

We get:

   A  B   C
a  0  3 NaN
b  1  4 NaN
c  2  5 NaN

This is because the index of s and the index of df have no matches. If there was a single match (since len(s) == 1), you'd get:

s = pandas.Series([6], index=['b'])

print(df.assign(C=s))

   A  B   C
a  0  3 NaN
b  1  4 6.0
c  2  5 NaN

But this isn't what you want, so you should just use a scalar:

print(df.assign(C=6))

Upvotes: 7

Is there a way to add a column to a geopandas dataframe using a single value geoseries?

Answers (2)

Related Questions