Reputation: 21244
I need to write code that takes in new dataframes and merges them with an existing one. The rows are ordered by date but unfortunately there is often an overlap. E.g.
Transaction_Date transaction
1330 26/05/2017 2997.71
1327 30/05/2017 -1394.59
1329 30/05/2017 -2650.00
1328 30/05/2017 664.00
and
1329 30/05/2017 -2650.00
1328 30/05/2017 664.00
1326 31/05/2017 374.79
1324 01/06/2017 -160.00
1325 01/06/2017 -27.62
Say the first dataframe is called df1
and the second is called df2
, how can I merge them to get rid of the duplicates in the overlapping part?
The expected result should be:
Transaction_Date transaction
1330 26/05/2017 2997.71
1327 30/05/2017 -1394.59
1329 30/05/2017 -2650.00
1328 30/05/2017 664.00
1326 31/05/2017 374.79
1324 01/06/2017 -160.00
1325 01/06/2017 -27.62
Upvotes: 1
Views: 34
Reputation: 862406
I believe need concat
with remove duplicates by index values by duplicated
with boolean indexing
:
df = pd.concat([df1, df2])
df = df[~df.index.duplicated()]
Full example:
import pandas as pd
df = pd.DataFrame({
'date': pd.date_range('2018-01-01', periods=7, freq='12H'),
'index': [1330, 1327, 1329, 1328, 1326, 1324, 1325]
}).set_index('index')
df1 = df.iloc[[0,1,2,3]]
df2 = df.iloc[[2,3,4,5,6]]
df = pd.concat([df1, df2])
df = df[~df.index.duplicated()]
print(df)
Returns:
date
index
1330 2018-01-01 00:00:00
1327 2018-01-01 12:00:00
1329 2018-01-02 00:00:00
1328 2018-01-02 12:00:00
1326 2018-01-03 00:00:00
1324 2018-01-03 12:00:00
1325 2018-01-04 00:00:00
Upvotes: 2