How to merge two overlapping dataframes

Question

I need to write code that takes in new dataframes and merges them with an existing one. The rows are ordered by date but unfortunately there is often an overlap. E.g.

    Transaction_Date    transaction
1330    26/05/2017  2997.71
1327    30/05/2017  -1394.59
1329    30/05/2017  -2650.00
1328    30/05/2017  664.00

and

1329    30/05/2017  -2650.00
1328    30/05/2017  664.00
1326    31/05/2017  374.79
1324    01/06/2017  -160.00
1325    01/06/2017  -27.62

Say the first dataframe is called df1 and the second is called df2, how can I merge them to get rid of the duplicates in the overlapping part?

The expected result should be:

    Transaction_Date    transaction
1330    26/05/2017  2997.71
1327    30/05/2017  -1394.59
1329    30/05/2017  -2650.00
1328    30/05/2017  664.00
1326    31/05/2017  374.79
1324    01/06/2017  -160.00
1325    01/06/2017  -27.62

jezrael · Accepted Answer

I believe need concat with remove duplicates by index values by duplicated with boolean indexing:

df = pd.concat([df1, df2])
df = df[~df.index.duplicated()]

Full example:

import pandas as pd

df = pd.DataFrame({
    'date': pd.date_range('2018-01-01', periods=7, freq='12H'),
    'index': [1330, 1327, 1329, 1328, 1326, 1324, 1325]
}).set_index('index')

df1 = df.iloc[[0,1,2,3]]
df2 = df.iloc[[2,3,4,5,6]]

df = pd.concat([df1, df2])
df = df[~df.index.duplicated()]

print(df)

Returns:

                     date
index                    
1330  2018-01-01 00:00:00
1327  2018-01-01 12:00:00
1329  2018-01-02 00:00:00
1328  2018-01-02 12:00:00
1326  2018-01-03 00:00:00
1324  2018-01-03 12:00:00
1325  2018-01-04 00:00:00

How to merge two overlapping dataframes

Answers (1)

Related Questions