Reputation: 556
I have two data frames that I am trying to combine -
Dataframe 1 -
Product Buyer Date Store
TV Person A 9/18/2018 Boston
DVD Person B 4/10/2018 New York
Blue-ray Player Person C 9/19/2018 Boston
Phone Person A 9/18/2018 Boston
Sound System Person C 3/05/2018 Washington
Dataframe 2 -
Product Type Buyer Date Store
TV Person B 5/29/2018 New York
Phone Person A 2/10/2018 Washington
The first dataframe has about 500k rows while the second dataframe has about 80k rows. There are time when the second dataframe has home columns but I am trying to get the final output with to show the same columns as the Dataframe 1 and update the Dataframe 1 rows with Dataframe 2.
The output looks like this -
Product Buyer Date Store
TV Person B 5/29/2018 New York
DVD Person B 4/10/2018 New York
Blue-ray Player Person C 9/19/2018 Boston
Phone Person A 2/10/2018 Washington
Sound System Person C 3/05/2018 Washington
I tried the join but the columns are repeated. Is there an elegant solution to do this?
Edit 1- I have already tried -
pd.merge(df,df_correction, left_on = ['Product'], right_on = ['Product Type'],how = 'outer')
Product Buyer_x Date_x Store_x Product Type Buyer_y Date_y Store_y
TV Person B 5/29/2018 New York TV Person B 5/29/2018 New York
DVD Person B 4/10/2018 New York NaN NaN NaN NaN
Blue-ray Player Person C 9/19/2018 Boston NaN NaN NaN NaN
Phone Person A 2/10/2018 Washington Phone Person A 2/10/2018 Washington
Sound System Person C 3/05/2018 Washington NaN NaN NaN NaN
Upvotes: 0
Views: 111
Reputation: 743
i think combine first is the function you are looking for https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html
can you try:
d1.rename(columns={'ProductType':'Product'}).set_index('Product').combine_first(d2.set_index('Product')).reset_index()
Upvotes: 1