Reputation: 3290
I have 2 data frames with the same column names. Old data frame old_df
and the new data frame is new_df
with 1 column as a key.
I am trying to merge the 2 data frames into a single data frame which following conditions.
old_df
should be taken new_df
should be added. new_df
should overwrite the data from old_df
. Below is my code snippet that I am trying to play with.
new_data = pd.read_csv(filepath)
new_data.set_index(['Name'])
old_data = pd.read_sql_query("select * from dbo.Details", con=engine)
old_data.set_index(['Name'])
merged_result = pd.merge(new_data[['Name','RIC','Volatility','Sector']],
old_data,
on='Name',
how='outer')
I am thinking of using np.where
from this point onwards but not sure how to proceed. please advice.
Upvotes: 1
Views: 405
Reputation: 1255
@jezrael's answer looks good. You may also try splitting dataset upon conditions and concatenating the old and new dataframes. In the following example, I'm taking col1 as index and producing results that comply with your question's rules for combination.
import pandas as pd
old_data = {'col1': ['a', 'b', 'c', 'd', 'e'], 'col2': ['A', 'B', 'C', 'D', 'E']}
new_data = {'col1': ['a', 'b', 'e', 'f', 'g'], 'col2': ['V', 'W', 'X', 'Y', 'Z']}
old_df = pd.DataFrame(old_data)
new_df = pd.DataFrame(new_data)
Now,
df = pd.concat([new_df, old_df[~old_df['col1'].isin(new_df['col1'])]], axis=0).reset_index(drop=True)
Which gives us
Hope this helps.
Upvotes: 1
Reputation: 863611
I believe you need DataFrame.combine_first
with DataFrame.set_index
for match by Name
columns:
merged_result = (new_data.set_index('Name')[['RIC','Volatility','Sector']]
.combine_first(old_data.set_index('Name'))
.reset_index())
Sample data:
old_data = pd.DataFrame({'RIC':range(6),
'Volatility':[5,3,6,9,2,4],
'Name':list('abcdef')})
print (old_data)
RIC Volatility Name
0 0 5 a
1 1 3 b
2 2 6 c
3 3 9 d
4 4 2 e
5 5 4 f
new_data = pd.DataFrame({'RIC':range(4),
'Volatility':[10,20,30, 40],
'Name': list('abhi')})
print (new_data)
RIC Volatility Name
0 0 10 a
1 1 20 b
2 2 30 h
3 3 40 i
merged_result = (new_data.set_index('Name')
.combine_first(old_data.set_index('Name'))
.reset_index())
print (merged_result)
Name RIC Volatility
0 a 0.0 10.0
1 b 1.0 20.0
2 c 2.0 6.0
3 d 3.0 9.0
4 e 4.0 2.0
5 f 5.0 4.0
6 h 2.0 30.0
7 i 3.0 40.0
Upvotes: 1