PPR
PPR

Reputation: 417

Replace multiple column values in one dataframe by values of another dataframe subjected to a common key column

I want to update the values in a GeoPanda dataframe from another GeoPanda dataframe for select columns. Both of them will have a common key called 'geometry.'

For example

df1 = pd.DataFrame([["X",1,1,0],
              ["Y",0,1,0],
              ["Z",0,0,0],
              ["Y",0,0,0]],columns=["geometry","Nonprofit","Business", "Education"])    

df2 = pd.DataFrame([["Y",1,1],
              ["Z",1,1]],columns=["geometry","Non", "Edu"])  

enter image description here

Following this answer I did the following steps:

df1 = df1.set_index('geometry')
df2 = df2.set_index('geometry')

list_1 = ['Nonprofit', 'Education']
list_2 = ['Non', 'Edu']

df1[list_1].update(df2[list_2])

This results in the wrong results without any warning. How can I fix this?

enter image description here

Notes:

Updating one column at a time (df1['Nonprofit'].update(df2['Non'])) will produce the correct result.

geometry Linestring from GeoPandas replaced by a character for simplicity.

Upvotes: 0

Views: 1398

Answers (1)

amain
amain

Reputation: 1688

DataFrame.update only updates columns with the same name!

Accordingly, one solution would be to first rename the columns in df2 to match those in df1.

Note that when calling update(), there is no need to specify the target columns in df1: all common columns will be updated. If required, you can specify which columns you want from df2 by using column indexing.

df2 = df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})
df1.update(df2)  

# optionally restrict columns:
# df1.update(df2['Nonprofit'])  

# alternative short version, leaving df2 untouched
df1.update(df2.rename(columns={'Non': 'Nonprofit', 'Edu': 'Education'})) 

gives

          Nonprofit  Business  Education
geometry                                
X               1.0         1        0.0
Y               1.0         1        1.0
Z               1.0         0        1.0
Y               1.0         0        1.0

The reason your "single column" approach works is that there you're implicitly using Series.update, where there is no such concept as common columns.

Upvotes: 3

Related Questions