Proper way to merge data in Pandas

Question

I currently have 2 dataframes and would like to merge them into one. But they have common fields. Like the one below:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': [1,2,3], 'B': [4,5,3], 'C': [7,8,9]})
df2 = pd.DataFrame({'A': [3,4,5], 'B': [6,7,8]})

There are some merging criteria

Use Column A as a key to merge
If there are common columns, use df1 and ignore the content in df2

So the result df would look like

   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  3  9.0
3  4  7  NaN
4  5  8  NaN

How should I write this merging equation? Thanks in advance.

jezrael · Accepted Answer

Use concat with DataFrame.drop_duplicates by column A:

df = pd.concat([df1, df2], sort=False, ignore_index=True).drop_duplicates('A')
print (df)
   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  3  9.0
4  4  7  NaN
5  5  8  NaN

Proper way to merge data in Pandas

Answers (2)

Related Questions