Reputation: 83
I want to find if the split column contains anything from the class list. If yes, I want to update the category column using the values from the class list. The desired category is my optimal goal.
domain split Category Desired Category
[email protected] XYT.com Null XYT
[email protected] XTY.com Null Null
[email protected] ssa.com Null ssa
[email protected] bbc.com Null bbc
[email protected] abk.com Null abk
[email protected] ssb.com Null ssb
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']
for index, row in df.iterrows():
for x in class:
intersection=row.split.contains(x)
if intersection:
df.loc[index,'class'] = intersection
Just cannot get it right
Please help, Thanks
Upvotes: 2
Views: 69
Reputation: 120479
Use str.extract
. Create a regular expression that will match one of the words in the list and extract the word will match (or NaN if none).
Update: As the '|' operator is never greedy even if it would produce a longer overall match, you have to reverse sort your list manually.
lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"
df['Category'] = df['split'].str.extract(pat)
>>> df
domain split Category
0 [email protected] XYT.com XYT
1 [email protected] XTY.com NaN
2 [email protected] ssa.com ssa
3 [email protected] bbc.com bbc
4 [email protected] abk.com abk
5 [email protected] ssb.com ssb
>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']
>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'
Upvotes: 2
Reputation: 23156
Assuming there can only be a maximum of one match, try:
df["Category"] = df["split"].apply(lambda x: " ".join(c for c in Class if c in x))
>>> df
domain split Category
0 [email protected] XYT.com XYT
1 [email protected] XTY.com
2 [email protected] ssa.com ssa
3 [email protected] bbc.com bbc
4 [email protected] abk.com abk
5 [email protected] ssb.com ssb
Upvotes: 1