iSerd
iSerd

Reputation: 178

Check if Columns Value is in One Specific Column With Pandas

So I have one dataframe which has multiple columns and I want to try to find out if the values in my "Linked.." columns is in column called "New Names", and if it is then that specific cell value should be set in such way "cell.value - Yes" or if not that "cell.value - No"

import pandas as pd

d = {'New Names': ['a,b,c','a','c,d,e,f','a'], 'Linked Letter 0': 
['a','b','c','d'],
'Linked Letter 1': ['c','s','v','None'],
'Linked Letter 2': ['None','None','d','s']}

df_new = pd.DataFrame(data=d)

df_new


      Index   New Names   Linked Letter 0   Linked Letter 1   Linked Letter 2  
     ------- ----------- ----------------- ----------------- ----------------- 
        0       a,b,c       a                 c                 None             
        1       a           b                 s                 None             
        2       c,d,e,f     c                 v                 d                
        3       a           d                 None              s      

So expected result should be as following table;

    Index   New Names   Linked Letter 0   Linked Letter 1   Linked Letter 2  
    ------- ----------- ----------------- ----------------- ----------------- 
    0        a,b,c         a - YES           c - YES           None             
    1        a             b - NO            s - NO            None             
    2        c,d,e,f       c - YES           v - NO            d - YES          
    3        a             d - NO            None              s - NO    

One Problem with the solution provided below:

The problem is that mapping to YES and NO to values sometimes don't work as expected. For instance, same value which gets YES at the end can get NO in the next row even though the value in New Names column is same in both rows.

Why do you think this would occur?

Upvotes: 1

Views: 767

Answers (1)

jpp
jpp

Reputation: 164623

You can use pd.DataFrame.filter to filter your Linked columns, a list comprehension to construct a Boolean array, and finally loc with np.where for your conditional logic:

df = pd.DataFrame(data=d)

for col in df.filter(like='Linked'):
    bools = [link in new_names for link, new_names in zip(df[col], df['New Names'])]
    df.loc[df[col] != 'None', col] += pd.Series(np.where(bools, ' - YES', ' - NO'))

print(df)

  Linked Letter 0 Linked Letter 1 Linked Letter 2 New Names
0         a - YES         c - YES            None     a,b,c
1          b - NO          s - NO            None         a
2         c - YES          v - NO         d - YES   c,d,e,f
3          d - NO            None          s - NO         a

Upvotes: 2

Related Questions