How to fill the NaN values with the values contains in another column?

Question

I want to find if the split column contains anything from the class list. If yes, I want to update the category column using the values from the class list. The desired category is my optimal goal.

sampledata

domain      split   Category    Desired Category
abc@XYT.com XYT.com Null         XYT
abb@XTY.com XTY.com Null         Null
abc@ssa.com ssa.com Null         ssa
bbb@bbc.com bbc.com Null         bbc
ccc@abk.com abk.com Null         abk
acc@ssb.com ssb.com Null         ssb
            
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']    



for index, row in df.iterrows():
    for x in class:
        intersection=row.split.contains(x)
        if intersection:
           df.loc[index,'class'] = intersection

Just cannot get it right

Please help, Thanks

Corralien · Accepted Answer

Use str.extract. Create a regular expression that will match one of the words in the list and extract the word will match (or NaN if none).

Update: As the '|' operator is never greedy even if it would produce a longer overall match, you have to reverse sort your list manually.

lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"

df['Category'] = df['split'].str.extract(pat)

>>> df
        domain    split Category
0  abc@XYT.com  XYT.com      XYT
1  abb@XTY.com  XTY.com      NaN
2  abc@ssa.com  ssa.com      ssa
3  bbb@bbc.com  bbc.com      bbc
4  ccc@abk.com  abk.com      abk
5  acc@ssb.com  ssb.com      ssb

>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']

>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'

How to fill the NaN values with the values contains in another column?

Answers (2)

Related Questions