Alejandro A
Alejandro A

Reputation: 1190

Merge two dfs and coalesce the results for nan values

I have two dfs that have same columns and contain same information, but from different sources:

df_orders = pd.DataFrame({'id':[1,2,3],'model':['A1','A3','A6'], 'color':['Red','Blue','Green']})
df_billed = pd.DataFrame({'id':[1,6,7],'model':['A1','A7','B1'], 'color':['Purple','Pink','Red']})

Then I do a merge left on the df_billed by ids and add suffixes as column names overlap:

merge_df = pd.merge(df_billed,df_orders,on='id',how='left',suffixes=('_order','_billed'))

Results in

     id|model_order|color_order | model_billed | color_billed 
 0   1 |       A1  |    Purple  |         A1   |       Red
 1   6 |       A7  |      Pink  |       NaN    |       NaN
 2   7 |       B1  |       Red  |        NaN   |       NaN

The column order has more priority when the suffix is _order than billed, and somehow I would like to have a dataframe where if no billed info, then we take the order, and the suffixes are removed:

     id|model_billed | color_billed |
 0   1 |        A1   |      Red     |
 1   6 |        A7   |      Pink    |
 2   7 |        B1   |      Purple  |

Ideally I thought of doing a combine_first to coalesce the colums and at the end rename them, but looks a bit dirty in code and looking for another more well-designed solution.

Upvotes: 1

Views: 888

Answers (1)

Shubham Periwal
Shubham Periwal

Reputation: 2248

You can just use .fillna() and use the _order columns to fill the NAs

merge_df['model_billed'] = merge_df['model_billed'].fillna(merge_df['model_order'])
merge_df['color_billed'] = merge_df['color_billed'].fillna(merge_df['color_order'])

Output

merge_df[['id', 'model_billed', 'color_billed']]
    id  model_billed    color_billed
0   1   A1              Red
1   6   A7              Pink
2   7   B1              Red

UPDATE

If there are more such columns, you can just use a loop like this:

col_names = ['model', 'color']
for col in col_names:
    merge_df[col+'_billed'] = merge_df[col+'_billed'].fillna(merge_df[col+'_order'])

Upvotes: 2

Related Questions