Reputation: 743
I am trying to update a dataframe according to another. I try to use df.update the issue that the index can be duplicated which raise the error : ValueError: cannot reindex from a duplicate axis
any idea how to do it?
For example:
d1 = pd.DataFrame({'ticket': ['a','b','c'], 'n': ['10','20','30'],
'q': ['100','120','130'],'description': ['da','db','dc']})
d2 = pd.DataFrame({'ticket': ['a','a','b'], 'n': ['40','50','60'],
'q': ['150','180','200']})
The expected result would be
pd.DataFrame({'ticket': ['a','a','b','c'], 'n': ['40','50','60', '30'],
'q': ['150','180','200','130'], 'description': ['da','da','db','dc']})
I try this
d1.set_index('ticket', inplace=True)
d1.update(d2.set_index('ticket'))
but raise the above error
Upvotes: 6
Views: 928
Reputation: 75080
Lets try combine_first
with set_index
and reindex
to get desired column order:
final=(d2.set_index('ticket').combine_first(d1.set_index('ticket'))
.reset_index().reindex(columns=d1.columns))
ticket n q description
0 a 40 150 da
1 a 50 180 da
2 b 60 200 db
3 c 30 130 dc
Upvotes: 5