user248884
user248884

Reputation: 911

Row wise merge dataframes with overlapping data in python/ pandas

I have two large data frames which I need to merge row-wise. These two dataframes may have a few overlapping rows.

Example:

data1
    key name               patent
    11  Alphagrep          112344
    12  Citrix             112345

data2
    Sr  name               patents
    11  Alphagrep          112344
    13  Taj                112322

I want to merge these dataframes row wise on [Key,Sr] & [patent, patents]

Which is: If row-data1[key]==row-data2[Sr] & row-data1[patent]==row-data2[patents], Merge or else append.

Result should be:

data1 + data2
    key name               patent
    11  Alphagrep          112344
    12  Citrix             112345
    13  Taj                112322

How should one do this in pandas?

Upvotes: 4

Views: 4634

Answers (2)

piRSquared
piRSquared

Reputation: 294228

set_index and combine_first

c1 = ['key', 'patent']
c2 = ['Sr', 'patents']

data1.set_index(c1) \
    .combine_first(
        data2.set_index(c2).rename_axis(c1)
    ).reset_index()

   key  patent       name
0   11  112344  Alphagrep
1   12  112345     Citrix
2   13  112322        Taj

Upvotes: 1

Mohammad Yusuf
Mohammad Yusuf

Reputation: 17054

You can do something like this:

print data1, '\n', '\n', data2, '\n'
data2.columns = data1.columns.tolist()
data3 = pd.concat([data1, data2]).drop_duplicates()
print data3

Output:

   key       name  patent
0   11  Alphagrep  112344
1   12     Citrix  112345 

   Sr       name  patents
0  11  Alphagrep   112344
1  13        Taj   112322 

   key       name  patent
0   11  Alphagrep  112344
1   12     Citrix  112345
1   13        Taj  112322

Upvotes: 5

Related Questions