Reputation: 99
I have 2 dfs
import pandas as pd
list_columns = ['Number', 'Name', 'Age']
list_data = [
[121, 'John', 25],
[122, 'Sam', 26]
]
df1 = pd.DataFrame(columns=list_columns, data=list_data)
Number Name Age
0 121 John 25
1 122 Sam 26
and
list_columns = ['Number', 'Name', 'Age']
list_data = [
[121, 'John', 31],
[122, 'Sam', 29],
[123, 'Andrew', 28]
]
df2 = pd.DataFrame(columns=list_columns, data=list_data)
Number Name Age
0 121 John 31
1 122 Sam 29
2 123 Andrew 28
In the end I want to take the missing values from df2 and add them into df1 bassed on the column Number.
In the above case in df1 I am missing only the Number 123, and I want to move only the data from this line to df1, so it will lok like
|Number|Name | Age|
| 121 |John | 25 |
| 122 |Sam | 26 |
| 123 |Andrew| 28 |
I tried to use concat
with keep= 'First'
but I am afraid that if a have lot of data it will alterate the existing data in df1(I want to add only missing data based on Number).
Is there a better way of achieving this? this how I tried to concat
pd.concat([df1,df2]).drop_duplicates(['Number'],keep='first')
Upvotes: 2
Views: 27
Reputation: 71689
Use DataFrame.set_index
on df1
and df2
to set the index as column Number
and use DataFrame.combine_first
:
df = (
df1.set_index('Number').combine_first(
df2.set_index('Number')).reset_index()
)
Result:
Number Name Age
0 121 John 25.0
1 122 Sam 26.0
2 123 Andrew 28.0
Upvotes: 1