Reputation: 1190
I have two dfs that have same columns and contain same information, but from different sources:
df_orders = pd.DataFrame({'id':[1,2,3],'model':['A1','A3','A6'], 'color':['Red','Blue','Green']})
df_billed = pd.DataFrame({'id':[1,6,7],'model':['A1','A7','B1'], 'color':['Purple','Pink','Red']})
Then I do a merge left on the df_billed by ids and add suffixes as column names overlap:
merge_df = pd.merge(df_billed,df_orders,on='id',how='left',suffixes=('_order','_billed'))
Results in
id|model_order|color_order | model_billed | color_billed
0 1 | A1 | Purple | A1 | Red
1 6 | A7 | Pink | NaN | NaN
2 7 | B1 | Red | NaN | NaN
The column order has more priority when the suffix is _order than billed, and somehow I would like to have a dataframe where if no billed info, then we take the order, and the suffixes are removed:
id|model_billed | color_billed |
0 1 | A1 | Red |
1 6 | A7 | Pink |
2 7 | B1 | Purple |
Ideally I thought of doing a combine_first to coalesce the colums and at the end rename them, but looks a bit dirty in code and looking for another more well-designed solution.
Upvotes: 1
Views: 888
Reputation: 2248
You can just use .fillna()
and use the _order columns to fill the NAs
merge_df['model_billed'] = merge_df['model_billed'].fillna(merge_df['model_order'])
merge_df['color_billed'] = merge_df['color_billed'].fillna(merge_df['color_order'])
merge_df[['id', 'model_billed', 'color_billed']]
id model_billed color_billed
0 1 A1 Red
1 6 A7 Pink
2 7 B1 Red
If there are more such columns, you can just use a loop like this:
col_names = ['model', 'color']
for col in col_names:
merge_df[col+'_billed'] = merge_df[col+'_billed'].fillna(merge_df[col+'_order'])
Upvotes: 2