Winston
Winston

Reputation: 1428

Proper way to merge data in Pandas

I currently have 2 dataframes and would like to merge them into one. But they have common fields. Like the one below:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': [1,2,3], 'B': [4,5,3], 'C': [7,8,9]})
df2 = pd.DataFrame({'A': [3,4,5], 'B': [6,7,8]})

There are some merging criteria

  1. Use Column A as a key to merge
  2. If there are common columns, use df1 and ignore the content in df2

So the result df would look like

   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  3  9.0
3  4  7  NaN
4  5  8  NaN

How should I write this merging equation? Thanks in advance.

Upvotes: 1

Views: 40

Answers (2)

jezrael
jezrael

Reputation: 863166

Use concat with DataFrame.drop_duplicates by column A:

df = pd.concat([df1, df2], sort=False, ignore_index=True).drop_duplicates('A')
print (df)
   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  3  9.0
4  4  7  NaN
5  5  8  NaN

Upvotes: 1

Dani Mesejo
Dani Mesejo

Reputation: 61910

You could use pd.concat and then remove the duplicates on 'A' column:

merged = pd.concat([df1, df2], sort=False)

mask = ~merged.duplicated('A')

print(merged[mask])

Output

   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  3  9.0
1  4  7  NaN
2  5  8  NaN

Upvotes: 1

Related Questions