Reputation: 1945
I have a main df:
print(df)
item dt_op
0 product_1 2019-01-08
1 product_2 2019-02-08
2 product_1 2019-01-08
...
and a subset of the first one, that contains only one product and two extra columns:
print(df_1)
item dt_op DQN_Pred DQN_Inv
0 product_1 2019-01-08 6 7.0
2 product_1 2019-01-08 2 2.0
...
That I am iteratively creating, with a for loop (hence, df_1 = df.loc[df.item == i] for i in items
).
I would like to merge df_1
and df
, at every step of the iteration, hence updating df with the two extra columns.
print(final_df)
item dt_op DQN_Pred DQN_Inv
0 product_1 2019-01-08 6 7.0
1 product_2 2019-02-08 nan nan
2 product_1 2019-01-08 2 2.0
...
and update the nan at the second step of the for loop, in which df_1
only contains product_2
.
How can I do it?
Upvotes: 1
Views: 74
Reputation: 75150
IIUC, you can use combine_first
with reindex
:
final_df=df_1.combine_first(df).reindex(columns=df_1.columns)
item dt_op DQN_Pred DQN_Inv
0 product_1 2019-01-08 6.0 7.0
1 product_2 2019-02-08 NaN NaN
2 product_1 2019-01-08 2.0 2.0
Alternatively, Using merge
, you can use the common keys with left_index
and right_index
=True
:
common_keys=df.columns.intersection(df_1.columns).tolist()
final_df=df.merge(df_1,on=common_keys,left_index=True,right_index=True,how='left')
item dt_op DQN_Pred DQN_Inv
0 product_1 2019-01-08 6.0 7.0
1 product_2 2019-02-08 NaN NaN
2 product_1 2019-01-08 2.0 2.0
Upvotes: 1