Reputation: 79
I have 2 lists, df and df2 (which is a taxonomy). I want to search a column in df using values from df2, and return ANOTHER value from df2, after a match/or matches are found. How do I do it?
My attempt is
import pandas as pd
df = pd.DataFrame({'Name':['a cat', 'grey puppy', 'red dog']})
df
df2 = pd.DataFrame({'BroadTerm':['cat', 'cat', 'dog', 'dog'], 'NarrowTerm':['cat', 'kitten', 'puppy', 'dog']})
NarrowTerm = df2.NarrowTerm.unique().tolist()
df['Animal'] = df['Name'].apply(lambda x: ','.join([part for part in NarrowTerm if part in x]))
df
which returns
Name Animal
0 a cat cat
1 grey puppy puppy
2 red dog dog
but I want it to return
Name Animal
0 a cat cat
1 grey puppy dog
2 red dog dog
UPDATED DATA
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple', 'grey puppy - narrow term', 'a cat puppy', 'reddog - single no spaces', 'acatdog - multiple no spaces']})
Upvotes: 1
Views: 923
Reputation: 14949
Can be done without apply
using str.extract and map:
df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))
Name Animal
0 a cat cat
1 grey puppy dog
2 red dog dog
NOTE : To create a mapping dict you can also use : pd.Series(df2.BroadTerm.values,index=df2.NarrowTerm).to_dict()
Upvotes: 3
Reputation: 571
You can split a string into a list of tokens in df['Name']
and transform the list
into np.array
. Then, use np.in1d()
to check if there is any token that exists in df2['NarrowTerm']
. If true, return the corresponding BroadTerm
.
Try this:
df['Animal'] = df['Name'].apply(lambda x: df2.loc[np.in1d(df2.NarrowTerm, np.array(x.split())), 'BroadTerm'].values[0])
Output:
print(df)
Name Animal
0 a cat cat
1 grey puppy dog
2 red dog dog
Upvotes: 1