Steve_Greenwood
Steve_Greenwood

Reputation: 556

Pandas merge two dataframe and overwrite rows

I have two data frames that I am trying to combine -

Dataframe 1 -

           Product     Buyer       Date       Store
                TV  Person A  9/18/2018      Boston
               DVD  Person B  4/10/2018    New York
   Blue-ray Player  Person C  9/19/2018      Boston
             Phone  Person A  9/18/2018      Boston
      Sound System  Person C  3/05/2018  Washington

Dataframe 2 -

  Product Type     Buyer       Date       Store
            TV  Person B  5/29/2018    New York
         Phone  Person A  2/10/2018  Washington

The first dataframe has about 500k rows while the second dataframe has about 80k rows. There are time when the second dataframe has home columns but I am trying to get the final output with to show the same columns as the Dataframe 1 and update the Dataframe 1 rows with Dataframe 2.

The output looks like this -

           Product     Buyer       Date       Store
                TV  Person B  5/29/2018    New York
               DVD  Person B  4/10/2018    New York
   Blue-ray Player  Person C  9/19/2018      Boston
             Phone  Person A  2/10/2018  Washington
      Sound System  Person C  3/05/2018  Washington

I tried the join but the columns are repeated. Is there an elegant solution to do this?

Edit 1- I have already tried -

pd.merge(df,df_correction, left_on = ['Product'], right_on = ['Product Type'],how = 'outer')



     Product   Buyer_x     Date_x     Store_x Product Type   Buyer_y     Date_y     Store_y
                TV  Person B  5/29/2018    New York           TV  Person B  5/29/2018    New York
               DVD  Person B  4/10/2018    New York          NaN       NaN        NaN         NaN
   Blue-ray Player  Person C  9/19/2018      Boston          NaN       NaN        NaN         NaN
             Phone  Person A  2/10/2018  Washington        Phone  Person A  2/10/2018  Washington
      Sound System  Person C  3/05/2018  Washington          NaN       NaN        NaN         NaN

Upvotes: 0

Views: 111

Answers (1)

i think combine first is the function you are looking for https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html

can you try:

d1.rename(columns={'ProductType':'Product'}).set_index('Product').combine_first(d2.set_index('Product')).reset_index()

enter image description here

Upvotes: 1

Related Questions