Reputation: 911
I have two large data frames which I need to merge row-wise. These two dataframes may have a few overlapping rows.
Example:
data1
key name patent
11 Alphagrep 112344
12 Citrix 112345
data2
Sr name patents
11 Alphagrep 112344
13 Taj 112322
I want to merge these dataframes row wise on [Key,Sr] & [patent, patents]
Which is: If row-data1[key]==row-data2[Sr] & row-data1[patent]==row-data2[patents], Merge or else append.
Result should be:
data1 + data2
key name patent
11 Alphagrep 112344
12 Citrix 112345
13 Taj 112322
How should one do this in pandas?
Upvotes: 4
Views: 4634
Reputation: 294228
set_index
and combine_first
c1 = ['key', 'patent']
c2 = ['Sr', 'patents']
data1.set_index(c1) \
.combine_first(
data2.set_index(c2).rename_axis(c1)
).reset_index()
key patent name
0 11 112344 Alphagrep
1 12 112345 Citrix
2 13 112322 Taj
Upvotes: 1
Reputation: 17054
You can do something like this:
print data1, '\n', '\n', data2, '\n'
data2.columns = data1.columns.tolist()
data3 = pd.concat([data1, data2]).drop_duplicates()
print data3
Output:
key name patent
0 11 Alphagrep 112344
1 12 Citrix 112345
Sr name patents
0 11 Alphagrep 112344
1 13 Taj 112322
key name patent
0 11 Alphagrep 112344
1 12 Citrix 112345
1 13 Taj 112322
Upvotes: 5