DEs
DEs

Reputation: 83

Need help in - Merge 2 dataframes based on a column list but match the rows with NULL values in column

I have 2 dataframes which I have to join. I am trying to use merge on columns ID and STATUS. However the problem is if for a row status is NULL in df2 I want it to still match it based on just ID and bring the name. If STATUS has a value match it else match just the ID and bring the name.

mer_col_list = ['ID','STATUS']
df_out = pd.merge(df1,df2, on=mer_col_list, how='left')

df1 |ID | STATUS | NAME | |11 | ACTIVE | John | |22 | DORMANT| NICK | |33 | NOT_ACTIVE| HARRY|

df2 |ID | STATUS | BRANCH | |11 | DORMANT| USA | |11 | | USA | |22 | | UK | |33 | NOT_ACTIVE | AUS|

df_out: |ID| NAME | BRANCH| |11| JOHN | USA | |22| NICK | USA | |33|HARRY | AUS |

Upvotes: 0

Views: 486

Answers (1)

jezrael
jezrael

Reputation: 862761

You can create another left join by ID only if STATUS is missing and then combine both DataFrames by DataFrame.fillna:

df_out1 = df1.merge(df2, on=['ID','STATUS'], how='left')
df_out2 = df1.merge(df2[df2['STATUS'].isna()].drop('STATUS',axis=1), on=['ID'],how='left')

df_out = df_out1.fillna(df_out2)
print (df_out)
   ID      STATUS   NAME BRANCH
0  11      ACTIVE   John    USA
1  22     DORMANT   NICK     UK
2  33  NOT_ACTIVE  HARRY    AUS

You can also remove missing values first and duplicates if exist duplicated ['ID','STATUS'] for df2 and ID if remove missing valeus per STATUS:

df21 = df2.dropna(subset=['STATUS']).drop_duplicates(['ID','STATUS'])
df_out1 = df1.merge(df21, on=['ID','STATUS'], how='left')

df22 = df2[df2['STATUS'].isna()].drop('STATUS',axis=1).drop_duplicates(['ID'])
df_out2 = df1.merge(df22, on='ID',how='left')

df_out = df_out1.fillna(df_out2)
print (df_out)

Dynamic solution - if always ID is in list mer_col_list:

mer_col_list = ['ID','STATUS']

df21 = df2.dropna(subset=mer_col_list).drop_duplicates(mer_col_list)
df_out1 = df1.merge(df21, on=mer_col_list, how='left')

no_id_cols = np.setdiff1d(mer_col_list, ['ID']) 
print (no_id_cols)
['STATUS']

df22=df2[df2[no_id_cols].isna().any(axis=1)].drop(no_id_cols,axis=1).drop_duplicates(['ID'])
df_out2 = df1.merge(df22, on='ID',how='left')

df_out = df_out1.fillna(df_out2)
print (df_out)
   ID      STATUS   NAME BRANCH
0  11      ACTIVE   John    USA
1  22     DORMANT   NICK     UK
2  33  NOT_ACTIVE  HARRY    AUS

Upvotes: 1

Related Questions