updating and merging a column in pandas dataframe

Question

I have a column 'A' in two dataframes, say df1 and df2.

df1:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 6 |  |
| ID3  | 7 |  |
| ID4  | 8 |  |

df2:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 1 |  |
| ID3  | 8 |  |
| ID5  | 7 |  |
| ID6  | 8 |  |
| ID7  | 9 |  |

Required updated df1:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 1 |  |
| ID3  | 8 |  |
| ID4  | 8 |  |
| ID5  | 7 |  |
| ID6  | 8 |  |
| ID7  | 9 |  |

I want to update the column 'A' in df1 with values from df2 if the ID is in df2, else I want to keep the same value in df1. Moreover, if there are new IDs in df2, I want to add the new values in df1.

I have seen the documentation of pd.DataFrame.update it does update the values from the df2 to df1 but it does not adds the new values to df1. Any help will be appreciated. Thanks in advance.

jezrael · Accepted Answer

I believe need:

df = pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last').sort_values('ID')
print (df)
    ID  A
0  ID1  5
1  ID2  1
2  ID3  8
3  ID4  8
3  ID5  7
4  ID6  8
5  ID7  9

Explanation:

First concat both DataFrame together:

print (pd.concat([df1, df2]))
    ID  A
0  ID1  5
1  ID2  6
2  ID3  7
3  ID4  8
0  ID1  5
1  ID2  1
2  ID3  8
3  ID5  7
4  ID6  8
5  ID7  9

Because same ID are created, remove duplicates by drop_duplicates with keep only last value:

print (pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last'))
    ID  A
3  ID4  8
0  ID1  5
1  ID2  1
2  ID3  8
3  ID5  7
4  ID6  8
5  ID7  9

And last sort by ID by sort_values.

updating and merging a column in pandas dataframe

Answers (2)

Related Questions