Reputation: 33
I have a column 'A' in two dataframes, say df1 and df2.
df1:
| ID | A | |
|------|---|--|
| ID1 | 5 | |
| ID2 | 6 | |
| ID3 | 7 | |
| ID4 | 8 | |
df2:
| ID | A | |
|------|---|--|
| ID1 | 5 | |
| ID2 | 1 | |
| ID3 | 8 | |
| ID5 | 7 | |
| ID6 | 8 | |
| ID7 | 9 | |
Required updated df1:
| ID | A | |
|------|---|--|
| ID1 | 5 | |
| ID2 | 1 | |
| ID3 | 8 | |
| ID4 | 8 | |
| ID5 | 7 | |
| ID6 | 8 | |
| ID7 | 9 | |
I want to update the column 'A' in df1 with values from df2 if the ID is in df2, else I want to keep the same value in df1. Moreover, if there are new IDs in df2, I want to add the new values in df1.
I have seen the documentation of pd.DataFrame.update it does update the values from the df2 to df1 but it does not adds the new values to df1. Any help will be appreciated. Thanks in advance.
Upvotes: 1
Views: 56
Reputation: 411
You can do the same thing using the merge method of Pandas.
You can merge df1 and df2 using the merge function as follows:
df = pd.merge(df2,df1, how ='outer')
Here I am merging df2 with df1. outer uses union of keys from both dataframes. Union will take all the keys from df2 and only those keys from df1 which are not available in df2.
The second step is to remove the duplicate rows considering ID column.
df = df.drop_duplicates(subset=['ID'],keep='first')
This will remove all the duplicate rows in column ID, except for the first occurrence.
The third step is to sort the rows considering ID column.
df =df.sort_values('ID')
Upvotes: 0
Reputation: 863361
I believe need:
df = pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last').sort_values('ID')
print (df)
ID A
0 ID1 5
1 ID2 1
2 ID3 8
3 ID4 8
3 ID5 7
4 ID6 8
5 ID7 9
Explanation:
First concat
both DataFrame
together:
print (pd.concat([df1, df2]))
ID A
0 ID1 5
1 ID2 6
2 ID3 7
3 ID4 8
0 ID1 5
1 ID2 1
2 ID3 8
3 ID5 7
4 ID6 8
5 ID7 9
Because same ID
are created, remove duplicates by drop_duplicates
with keep only last value:
print (pd.concat([df1, df2]).drop_duplicates(subset=['ID'], keep='last'))
ID A
3 ID4 8
0 ID1 5
1 ID2 1
2 ID3 8
3 ID5 7
4 ID6 8
5 ID7 9
And last sort by ID
by sort_values
.
Upvotes: 1