Map values by comparing 2 columns from different dataframes using partial match in python

Question

I have 2 dataframes df1 containing id_number and df2 containing identity_No.

Looking to map highest matching row values from Dataframe2 to Dataframe1 using conditions. Want to compare each row(df1['id_number']) from df1 with the whole column(df2['identity_No']) of df2. I have tried using partial match as well but not getting the output.

df1

score   id_number       company_name      company_code     match_acc     action_reqd
20      IN2231D           AXN pvt Ltd        IN225                          Yes
45      UK654IN        Aviva Intl Ltd        IN115                          No
65      SL1432H   Ship Incorporations        CZ555                          Yes
35      LK0678G  Oppo Mobiles pvt ltd        PQ795                          Yes
59      NG5678J             Nokia Inc        RS885                          No
20      IN2231D           AXN pvt Ltd        IN215                          Yes

df2

OR_score   identity_No       comp_name        comp_code   
51          UK654IN        Aviva Int.L Ltd       IN515  
25          SL6752J       Ship Inc Traders       CZ555  
79          NG5678K             Nokia Inc        RS005 
20          IN22312           AXN pvt Ltd        IN255
38          LK0665G       Oppo Mobiles ltd       PQ895

For Eg: The df1.id_number need to be compared with df2.identity_No, Looking to match based on row1 of df1['id_number'] will match across all rows of df2['identity_No'], and has highest match percentage wrt. row4 of df2['identity_No'] , and its more than 80%, it will copy the respective values from row4 of df2 to row1 of df1. same to be applied for each row of df1.

Expected Output:

score   id_number       company_name      company_code     match_acc     action_reqd
20      IN22312           AXN pvt Ltd        IN225              90          Yes
51      UK654IN       Aviva Int.L Ltd        IN115              100         No
25      SL1432H   Ship Incorporations        CZ555              30          Yes
38      LK0665G      Oppo Mobiles ltd        PQ795              80          Yes
79      NG5678K             Nokia Inc        RS885              85          No
20      IN22312           AXN pvt Ltd        IN225              90          Yes

I have tried this now:

for index, row in df1.iterrows():
    for index2, config2 in df2.iterrows():
        if process.extractOne(row["id_number"], df["identity_No"])[1] >=80:
            df1['id_number'][index] = config2['identity_No']
            df1['company_name'][index] = config2['comp_name']
            df1['company_code'][index] = config2['comp_code']
            df1['score'][index] = config2['OR_Score']

Attempt2

for index, row in df1.iterrows():
        for index2, config2 in df2.iterrows():
            if fuzz.partial_ratio(row["id_number"], config2["identity_No"]) >=80:

Please Suggest

Map values by comparing 2 columns from different dataframes using partial match in python

Answers (1)

Related Questions