Alexandru Costinel
Alexandru Costinel

Reputation: 99

Concate 2 dfs by a condition

I have 2 dfs

import pandas as pd
list_columns = ['Number', 'Name', 'Age']
list_data = [
    [121, 'John', 25],
    [122, 'Sam', 26]
    ]
df1 = pd.DataFrame(columns=list_columns, data=list_data)

   Number  Name  Age
0     121  John   25
1     122   Sam   26

and

list_columns = ['Number', 'Name', 'Age']
list_data = [
    [121, 'John', 31],
    [122, 'Sam', 29],
    [123, 'Andrew', 28]
    ]
df2 = pd.DataFrame(columns=list_columns, data=list_data)

   Number    Name  Age
0     121    John   31
1     122     Sam   29
2     123  Andrew   28

In the end I want to take the missing values from df2 and add them into df1 bassed on the column Number.

In the above case in df1 I am missing only the Number 123, and I want to move only the data from this line to df1, so it will lok like

|Number|Name  | Age|
| 121  |John  | 25 |
| 122  |Sam   | 26 |
| 123  |Andrew| 28 |

I tried to use concat with keep= 'First' but I am afraid that if a have lot of data it will alterate the existing data in df1(I want to add only missing data based on Number).

Is there a better way of achieving this? this how I tried to concat

pd.concat([df1,df2]).drop_duplicates(['Number'],keep='first')

Upvotes: 2

Views: 27

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use DataFrame.set_index on df1 and df2 to set the index as column Number and use DataFrame.combine_first:

df = (
    df1.set_index('Number').combine_first(
        df2.set_index('Number')).reset_index()
)

Result:

   Number    Name   Age
0     121    John  25.0
1     122     Sam  26.0
2     123  Andrew  28.0

Upvotes: 1

Related Questions