Rahul Agarwal
Rahul Agarwal

Reputation: 4100

String Matching and get more than 1 column in Pandas

I need to match Name from df1 to Item_Name from df2. Wherever the name matches I also need Item_Id and Material_Name from df2.

I have two data frames:

Df1:

Original df has 1000+ Name

   Id    Name
    1     Paper
    2     Paper Bag
    3     Scissors
    4     Mat
    5     Cat
    6     Good Cat

2nd Df:

Original df has 1000+ Item_Name

Item_ID   Item_Name    Material_Name
1         Paper Bag      Office
2         wallpaper      Decor
3         paper          Office
4         cat cage       Animal Misc
5         good cat       Animal

Expected Output:

Id Name              Item_ID      Material_Name
1  Paper              1,2,3       Office,Decor,Office 
2  Paper Bag          1,2,3       Office,Decor,Office 
3  Scissors            NA         NA 
4  Mat                 NA         NA  
5  Cat                4,5         Animal Misc, Animal
6  Good Cat           4,5         Animal Misc,Animal

Code:

def matcher(query):

    matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
    if matches:
        return ','.join(map(str, matches))
    else:
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

This worked properly when I need to have one column and currently I am running this code twice to get Item_ID and Material_Name.

ASK:

Need help if there is another way to not to run the function twice but I can get even 2 or 3 columns in one go

Upvotes: 1

Views: 57

Answers (2)

jpp
jpp

Reputation: 164673

Here's one way using pd.DataFrame.loc and reusing Boolean masks:

def matcher(x):

    # construct 2-way mask
    m1 = df2['Item_Name'].str.contains(x, regex=False, case=False)
    m2 = [any(w in i.lower() for w in x.lower().split()) for i in df2['Item_Name']]

    # apply 2-way mask
    res_id = df2.loc[m1 | m2, 'Item_ID']
    res_mat = df2.loc[m1 | m2, 'Material_Name']

    return ','.join(res_id.astype(str)), ','.join(res_mat.astype(str))

df1[['Item_ID', 'Material_Name']] = pd.DataFrame(df1['Name'].apply(matcher).tolist())

print(df1)

   Id       Name Item_ID        Material_Name
0   1      Paper   1,2,3  Office,Decor,Office
1   2  Paper Bag   1,2,3  Office,Decor,Office
2   3   Scissors                             
3   4        Mat                             
4   5        Cat     4,5   Animal Misc,Animal
5   6   Good Cat     4,5   Animal Misc,Animal

Upvotes: 1

bunbun
bunbun

Reputation: 2676

You can try getting both Item_ID and Material_Name as a tuple from your query, then apply the appropriate column with [i[0] for i in matches] or [i[1] for i in matches].

def matcher(query):
    matches = [(i['Item_ID'], i['Material_Name']) for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in df1['Name'].lower().split())]

    if matches:
        df1['Material_Name'].apply(','.join(map(str, [i[1] for i in matches])))
        return ','.join(map(str, [i[0] for i in matches]))
    else:
        df1['Material_Name'].apply("NA")
        return 'NA'

df1['Item_ID'] = df1['Name'].apply(matcher)

Upvotes: 0

Related Questions