Reputation: 986
It is giving me duplicate/suffixed columns. Most questions I am seeing people want a suffix on their data, I am hoping for the opposite.
So far those columns don't seem to be useful and they all seem to end up with NaN
from a previous merge.
I want the merge to work by overwritting if it finds a duplicate column or adding the column to the dataframe if it doesn't exist. I was hoping the overwritting would happen based parameters of the merge but it seems to default to keeping all columns and just suffixing.
data4 = {'col1': [1, 2], 'col2': [3, 4]}
df4 = pd.DataFrame(data=data4)
data5 = {'col1': [1, 2], 'col2': [3, 4], 'col3': [9, 2]}
df5 = pd.DataFrame(data=data5)
df_merged = pd.merge(df4, df5, how="outer", left_on="col1", right_on="col1")
col1 col2_x col2_y col3
0 1 3 3 9
1 2 4 4 2
data6 = {'col1': [1, 2], 'col2': [3, 4], 'col3': [9, 2]}
df6 = pd.DataFrame(data=data6)
pd.merge(df6, df_merged, how="right", left_on="col1", right_on="col1")
col1 col2 col3_x col2_x col2_y col3_y
0 1 3 9 3 3 9
1 2 4 2 4 4 2
Ideally, in the last merge I would like it to overwrite, the left or right dataframe based on the params, instead of trying to preserve data and adding suffixes
.
Upvotes: 0
Views: 940
Reputation: 310
Merge in pandas is similar to 'joins' in SQL. It will result in common columns being renamed with prefixes to hold the values of the 2 merging dataframes.
For your case, you want to update the dataframe instead.
Upvotes: 0
Reputation: 22503
What you need is combine_first
:
data4 = {'col1': [1, 2], 'col2': [3, 4]}
df4 = pd.DataFrame(data=data4)
data5 = {'col1': [1, 2], 'col2': [3, 4], 'col3': [9, 2]}
df5 = pd.DataFrame(data=data5)
data6 = {'col1': [1, 2], 'col2': [3, 4], 'col3': [9, 2]}
df6 = pd.DataFrame(data=data6)
print (df4.combine_first(df5).combine_first(df6))
col1 col2 col3
0 1 3 9.0
1 2 4 2.0
Upvotes: 2