Reputation: 616

Merge pandas dataframe with overwrite of columns

What is the quickest way to merge to python data frames in this manner? I have two data frames with similar structures (both have a primary key id and some value columns). What I want to do is merge the two data frames based on id. Are there any ways do this based on pandas operations? How I've implemented it right now is as coded below:

import pandas as pd

a = pd.DataFrame({'id': [1,2,3], 'letter': ['a', 'b', 'c']})
b = pd.DataFrame({'id': [1,3,4], 'letter': ['A', 'C', 'D']})
a_dict =  {e[id]: e for e in a.to_dict('record')}
b_dict =  {e[id]: e for e in b.to_dict('record')}
c_dict = a_dict.copy()
c_dict.update(b_dict)
c = pd.DataFrame(list(c.values())

Here, c would be equivalent to

pd.DataFrame({'id': [1,2,3,4], 'letter':['A','b', 'C', 'D']})

   id letter
0   1      A
1   2      b
2   3      C
3   4      D

Upvotes: 6

Answers (3)

piRSquared

Reputation: 294218

`combine_first`

If 'id' is your primary key, then use it as your index.

b.set_index('id').combine_first(a.set_index('id')).reset_index()

   id letter
0   1      A
1   2      b
2   3      C
3   4      D

`merge` with `groupby`

a.merge(b, 'outer', 'id').groupby(lambda x: x.split('_')[0], axis=1).last()

   id letter
0   1      A
1   2      b
2   3      C
3   4      D

Upvotes: 7

niraj

Reputation: 18208

One way may be as following:

append dataframe a to dataframe b
drop duplicates based on id
sort values on remaining by id
reset index and drop older index

You can try:

import pandas as pd

a = pd.DataFrame({'id': [1,2,3], 'letter': ['a', 'b', 'c']})
b = pd.DataFrame({'id': [1,3,4], 'letter': ['A', 'C', 'D']})

c = b.append(a).drop_duplicates(subset='id').sort_values('id').reset_index(drop=True)

print(c)

Upvotes: 1

Khalil Al Hooti

Reputation: 4506

Try this

c = pd.concat([a, b], axis=0).sort_values('letter').drop_duplicates('id', keep='first').sort_values('id')

c.reset_index(drop=True, inplace=True)

print(c)

    id  letter
0   1   A
1   2   b
2   3   C
3   4   D

Upvotes: 0

Merge pandas dataframe with overwrite of columns

Answers (3)

combine_first

merge with groupby

Related Questions

`combine_first`

`merge` with `groupby`