Mohit Aneja
Mohit Aneja

Reputation: 33

updating and merging a column in pandas dataframe

I have a column 'A' in two dataframes, say df1 and df2.

df1:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 6 |  |
| ID3  | 7 |  |
| ID4  | 8 |  |

df2:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 1 |  |
| ID3  | 8 |  |
| ID5  | 7 |  |
| ID6  | 8 |  |
| ID7  | 9 |  |

Required updated df1:

|  ID  | A |  |
|------|---|--|
| ID1  | 5 |  |
| ID2  | 1 |  |
| ID3  | 8 |  |
| ID4  | 8 |  |
| ID5  | 7 |  |
| ID6  | 8 |  |
| ID7  | 9 |  |

I want to update the column 'A' in df1 with values from df2 if the ID is in df2, else I want to keep the same value in df1. Moreover, if there are new IDs in df2, I want to add the new values in df1.

I have seen the documentation of pd.DataFrame.update it does update the values from the df2 to df1 but it does not adds the new values to df1. Any help will be appreciated. Thanks in advance.

Upvotes: 1

Views: 56

Answers (2)

vibhu_singh
vibhu_singh

Reputation: 411

You can do the same thing using the merge method of Pandas.

You can merge df1 and df2 using the merge function as follows:

df = pd.merge(df2,df1, how ='outer')

Here I am merging df2 with df1. outer uses union of keys from both dataframes. Union will take all the keys from df2 and only those keys from df1 which are not available in df2.

enter image description here

The second step is to remove the duplicate rows considering ID column.

df = df.drop_duplicates(subset=['ID'],keep='first')

This will remove all the duplicate rows in column ID, except for the first occurrence.

enter image description here

The third step is to sort the rows considering ID column.

df =df.sort_values('ID')

enter image description here

Upvotes: 0

jezrael
jezrael

Reputation: 863361

I believe need:

df = pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last').sort_values('ID')
print (df)
    ID  A
0  ID1  5
1  ID2  1
2  ID3  8
3  ID4  8
3  ID5  7
4  ID6  8
5  ID7  9

Explanation:

First concat both DataFrame together:

print (pd.concat([df1, df2]))
    ID  A
0  ID1  5
1  ID2  6
2  ID3  7
3  ID4  8
0  ID1  5
1  ID2  1
2  ID3  8
3  ID5  7
4  ID6  8
5  ID7  9

Because same ID are created, remove duplicates by drop_duplicates with keep only last value:

print (pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last'))
    ID  A
3  ID4  8
0  ID1  5
1  ID2  1
2  ID3  8
3  ID5  7
4  ID6  8
5  ID7  9

And last sort by ID by sort_values.

Upvotes: 1

Related Questions