Alok
Alok

Reputation: 3290

merge new data into old data by replacing old data while appending new rows

I have 2 data frames with the same column names. Old data frame old_df and the new data frame is new_df with 1 column as a key.

I am trying to merge the 2 data frames into a single data frame which following conditions.

  1. If the key is missing in the new table, then data from old_df should be taken
  2. if the key is missing in old table, then data from new_df should be added.
  3. If the key is present in both the tables then the data from new_df should overwrite the data from old_df.

Below is my code snippet that I am trying to play with.

new_data = pd.read_csv(filepath)
new_data.set_index(['Name'])
old_data = pd.read_sql_query("select * from dbo.Details", con=engine)
old_data.set_index(['Name'])
merged_result = pd.merge(new_data[['Name','RIC','Volatility','Sector']],
                         old_data,
                         on='Name',
                         how='outer')

I am thinking of using np.where from this point onwards but not sure how to proceed. please advice.

Upvotes: 1

Views: 405

Answers (2)

Yash Nag
Yash Nag

Reputation: 1255

@jezrael's answer looks good. You may also try splitting dataset upon conditions and concatenating the old and new dataframes. In the following example, I'm taking col1 as index and producing results that comply with your question's rules for combination.

import pandas as pd
old_data = {'col1': ['a', 'b', 'c', 'd', 'e'], 'col2': ['A', 'B', 'C', 'D', 'E']}
new_data = {'col1': ['a', 'b', 'e', 'f', 'g'], 'col2': ['V', 'W', 'X', 'Y', 'Z']}
old_df = pd.DataFrame(old_data)
new_df = pd.DataFrame(new_data)

old_df: enter image description here

new_df: enter image description here

Now,

df = pd.concat([new_df, old_df[~old_df['col1'].isin(new_df['col1'])]], axis=0).reset_index(drop=True)

Which gives us

df: enter image description here

Hope this helps.

Upvotes: 1

jezrael
jezrael

Reputation: 863611

I believe you need DataFrame.combine_first with DataFrame.set_index for match by Name columns:

merged_result = (new_data.set_index('Name')[['RIC','Volatility','Sector']]
                 .combine_first(old_data.set_index('Name'))
                 .reset_index())

Sample data:

old_data = pd.DataFrame({'RIC':range(6),
                         'Volatility':[5,3,6,9,2,4], 
                         'Name':list('abcdef')})

print (old_data)
   RIC  Volatility Name
0    0           5    a
1    1           3    b
2    2           6    c
3    3           9    d
4    4           2    e
5    5           4    f

new_data = pd.DataFrame({'RIC':range(4),
                         'Volatility':[10,20,30, 40], 
                         'Name': list('abhi')})

print (new_data)
   RIC  Volatility Name
0    0          10    a
1    1          20    b
2    2          30    h
3    3          40    i

merged_result = (new_data.set_index('Name')
                 .combine_first(old_data.set_index('Name'))
                 .reset_index())
print (merged_result)
  Name  RIC  Volatility
0    a  0.0        10.0
1    b  1.0        20.0
2    c  2.0         6.0
3    d  3.0         9.0
4    e  4.0         2.0
5    f  5.0         4.0
6    h  2.0        30.0
7    i  3.0        40.0

Upvotes: 1

Related Questions