How to fill the NaN values with the values contains in another column?

I want to find if the split column contains anything from the class list. If yes, I want to update the category column using the values from the class list. The desired category is my optimal goal.

sampledata

domain      split   Category    Desired Category
[email protected] XYT.com Null         XYT
[email protected] XTY.com Null         Null
[email protected] ssa.com Null         ssa
[email protected] bbc.com Null         bbc
[email protected] abk.com Null         abk
[email protected] ssb.com Null         ssb
            
Class=['NaN','XYT','ssa','abk','abc','def','asds','ssb','bbc','XY','ab']    



for index, row in df.iterrows():
    for x in class:
        intersection=row.split.contains(x)
        if intersection:
           df.loc[index,'class'] = intersection

Just cannot get it right

Please help, Thanks

Upvotes: 2

Views: 69

Answers (2)

Corralien
Corralien

Reputation: 120479

Use str.extract. Create a regular expression that will match one of the words in the list and extract the word will match (or NaN if none).

Update: As the '|' operator is never greedy even if it would produce a longer overall match, you have to reverse sort your list manually.

lst = ['NaN','XY','ab','XYT','ssa','abk','abc','def','asds','ssb','bbc']
lst = sorted(lst, reverse=True)
pat = fr"({'|'.join(lst)})"

df['Category'] = df['split'].str.extract(pat)
>>> df
        domain    split Category
0  [email protected]  XYT.com      XYT
1  [email protected]  XTY.com      NaN
2  [email protected]  ssa.com      ssa
3  [email protected]  bbc.com      bbc
4  [email protected]  abk.com      abk
5  [email protected]  ssb.com      ssb

>>> lst
['ssb', 'ssa', 'def', 'bbc', 'asds', 'abk', 'abc', 'ab', 'XYT', 'XY', 'NaN']

>>> pat
'(ssb|ssa|def|bbc|asds|abk|abc|ab|XYT|XY|NaN)'

Upvotes: 2

not_speshal
not_speshal

Reputation: 23156

Assuming there can only be a maximum of one match, try:

df["Category"] = df["split"].apply(lambda x: " ".join(c for c in Class if c in x))

>>> df
        domain    split Category
0  [email protected]  XYT.com   XYT
1  [email protected]  XTY.com      
2  [email protected]  ssa.com   ssa
3  [email protected]  bbc.com   bbc
4  [email protected]  abk.com   abk
5  [email protected]  ssb.com   ssb

Upvotes: 1

Related Questions