Reputation: 4100
I need to match Name
from df1 to Item_Name
from df2. Wherever the name matches I also need Item_Id
and Material_Name
from df2.
I have two data frames:
Df1:
Original df has 1000+ Name
Id Name
1 Paper
2 Paper Bag
3 Scissors
4 Mat
5 Cat
6 Good Cat
2nd Df:
Original df has 1000+ Item_Name
Item_ID Item_Name Material_Name
1 Paper Bag Office
2 wallpaper Decor
3 paper Office
4 cat cage Animal Misc
5 good cat Animal
Expected Output:
Id Name Item_ID Material_Name
1 Paper 1,2,3 Office,Decor,Office
2 Paper Bag 1,2,3 Office,Decor,Office
3 Scissors NA NA
4 Mat NA NA
5 Cat 4,5 Animal Misc, Animal
6 Good Cat 4,5 Animal Misc,Animal
Code:
def matcher(query):
matches = [i['Item_ID'] for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in query.lower().split())]
if matches:
return ','.join(map(str, matches))
else:
return 'NA'
df1['Item_ID'] = df1['Name'].apply(matcher)
This worked properly when I need to have one column and currently I am running this code twice to get Item_ID
and Material_Name
.
ASK:
Need help if there is another way to not to run the function twice but I can get even 2 or 3 columns in one go
Upvotes: 1
Views: 57
Reputation: 164673
Here's one way using pd.DataFrame.loc
and reusing Boolean masks:
def matcher(x):
# construct 2-way mask
m1 = df2['Item_Name'].str.contains(x, regex=False, case=False)
m2 = [any(w in i.lower() for w in x.lower().split()) for i in df2['Item_Name']]
# apply 2-way mask
res_id = df2.loc[m1 | m2, 'Item_ID']
res_mat = df2.loc[m1 | m2, 'Material_Name']
return ','.join(res_id.astype(str)), ','.join(res_mat.astype(str))
df1[['Item_ID', 'Material_Name']] = pd.DataFrame(df1['Name'].apply(matcher).tolist())
print(df1)
Id Name Item_ID Material_Name
0 1 Paper 1,2,3 Office,Decor,Office
1 2 Paper Bag 1,2,3 Office,Decor,Office
2 3 Scissors
3 4 Mat
4 5 Cat 4,5 Animal Misc,Animal
5 6 Good Cat 4,5 Animal Misc,Animal
Upvotes: 1
Reputation: 2676
You can try getting both Item_ID
and Material_Name
as a tuple from your query, then apply the appropriate column with [i[0] for i in matches]
or [i[1] for i in matches]
.
def matcher(query):
matches = [(i['Item_ID'], i['Material_Name']) for i in df2[['Item_ID','Name']].to_dict('records') if any(q in i['Name'].lower() for q in df1['Name'].lower().split())]
if matches:
df1['Material_Name'].apply(','.join(map(str, [i[1] for i in matches])))
return ','.join(map(str, [i[0] for i in matches]))
else:
df1['Material_Name'].apply("NA")
return 'NA'
df1['Item_ID'] = df1['Name'].apply(matcher)
Upvotes: 0