Reputation: 178
So I have one dataframe which has multiple columns and I want to try to find out if the values in my "Linked.." columns is in column called "New Names", and if it is then that specific cell value should be set in such way "cell.value - Yes" or if not that "cell.value - No"
import pandas as pd
d = {'New Names': ['a,b,c','a','c,d,e,f','a'], 'Linked Letter 0':
['a','b','c','d'],
'Linked Letter 1': ['c','s','v','None'],
'Linked Letter 2': ['None','None','d','s']}
df_new = pd.DataFrame(data=d)
df_new
Index New Names Linked Letter 0 Linked Letter 1 Linked Letter 2
------- ----------- ----------------- ----------------- -----------------
0 a,b,c a c None
1 a b s None
2 c,d,e,f c v d
3 a d None s
So expected result should be as following table;
Index New Names Linked Letter 0 Linked Letter 1 Linked Letter 2
------- ----------- ----------------- ----------------- -----------------
0 a,b,c a - YES c - YES None
1 a b - NO s - NO None
2 c,d,e,f c - YES v - NO d - YES
3 a d - NO None s - NO
One Problem with the solution provided below:
The problem is that mapping to YES and NO to values sometimes don't work as expected. For instance, same value which gets YES at the end can get NO in the next row even though the value in New Names column is same in both rows.
Why do you think this would occur?
Upvotes: 1
Views: 767
Reputation: 164623
You can use pd.DataFrame.filter
to filter your Linked
columns, a list comprehension to construct a Boolean array, and finally loc
with np.where
for your conditional logic:
df = pd.DataFrame(data=d)
for col in df.filter(like='Linked'):
bools = [link in new_names for link, new_names in zip(df[col], df['New Names'])]
df.loc[df[col] != 'None', col] += pd.Series(np.where(bools, ' - YES', ' - NO'))
print(df)
Linked Letter 0 Linked Letter 1 Linked Letter 2 New Names
0 a - YES c - YES None a,b,c
1 b - NO s - NO None a
2 c - YES v - NO d - YES c,d,e,f
3 d - NO None s - NO a
Upvotes: 2