update dataframe with another with duplicated index

I am trying to update a dataframe according to another. I try to use df.update the issue that the index can be duplicated which raise the error : ValueError: cannot reindex from a duplicate axis

any idea how to do it?

For example:

d1 = pd.DataFrame({'ticket': ['a','b','c'], 'n': ['10','20','30'], 
                   'q': ['100','120','130'],'description': ['da','db','dc']})

enter image description here

d2 = pd.DataFrame({'ticket': ['a','a','b'], 'n': ['40','50','60'], 
                   'q': ['150','180','200']})

enter image description here

The expected result would be

pd.DataFrame({'ticket': ['a','a','b','c'], 'n': ['40','50','60', '30'], 
              'q': ['150','180','200','130'], 'description': ['da','da','db','dc']})

enter image description here

I try this d1.set_index('ticket', inplace=True) d1.update(d2.set_index('ticket')) but raise the above error

Upvotes: 6

Views: 928

Answers (1)

anky
anky

Reputation: 75080

Lets try combine_first with set_index and reindex to get desired column order:

final=(d2.set_index('ticket').combine_first(d1.set_index('ticket'))
                  .reset_index().reindex(columns=d1.columns))

  ticket   n    q description
0      a  40  150          da
1      a  50  180          da
2      b  60  200          db
3      c  30  130          dc

Upvotes: 5

Related Questions