Reputation: 417
I want to update the values in a GeoPanda dataframe from another GeoPanda dataframe for select columns. Both of them will have a common key called 'geometry.'
For example
df1 = pd.DataFrame([["X",1,1,0],
["Y",0,1,0],
["Z",0,0,0],
["Y",0,0,0]],columns=["geometry","Nonprofit","Business", "Education"])
df2 = pd.DataFrame([["Y",1,1],
["Z",1,1]],columns=["geometry","Non", "Edu"])
Following this answer I did the following steps:
df1 = df1.set_index('geometry')
df2 = df2.set_index('geometry')
list_1 = ['Nonprofit', 'Education']
list_2 = ['Non', 'Edu']
df1[list_1].update(df2[list_2])
This results in the wrong results without any warning. How can I fix this?
Notes:
Updating one column at a time (df1['Nonprofit'].update(df2['Non'])) will produce the correct result.
geometry Linestring from GeoPandas replaced by a character for simplicity.
Upvotes: 0
Views: 1398
Reputation: 1688
DataFrame.update only updates columns with the same name!
Accordingly, one solution would be to first rename the columns in df2
to match those in df1
.
Note that when calling update()
, there is no need to specify the target columns in df1
: all common columns will be updated. If required, you can specify which columns you want from df2
by using column indexing.
df2 = df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})
df1.update(df2)
# optionally restrict columns:
# df1.update(df2['Nonprofit'])
# alternative short version, leaving df2 untouched
df1.update(df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'}))
gives
Nonprofit Business Education
geometry
X 1.0 1 0.0
Y 1.0 1 1.0
Z 1.0 0 1.0
Y 1.0 0 1.0
The reason your "single column" approach works is that there you're implicitly using Series.update, where there is no such concept as common columns.
Upvotes: 3