Reputation: 218
I am trying to create a Labels
column labelling my data, I've tried adapting answers from similar questions but gotten stuck with what I specifically need.
My data looks like this (unfortunately I can't provide a real example):
Gene CLN
Gene1 cardiovascular
Gene2 Cardiovascular
Gene3 Neurological
Currently I am trying:
df['Labels'] = ['Probable' if df['CLN'].str.contains("cardio", case=False) else NA for x in df['CLN']]
The condition is to find all rows in the CLN
column and label them if they have a partial string match for 'cardio', and ignore labeling/doing anything with any rows that don't match, but my code gives an error (ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
) - I am not confident with python so I'm not sure to do to address this?
Expected output:
Gene CLN Labels
Gene1 cardiovascular Probable
Gene2 Cardiovascular Probable
Gene3 Neurological NA
The idea long term is I could write another line when I want to add more labels than just Probable
too, so doing something like:
df['Labels'] = ['Probable' if df['CLN'].str.contains("cardio", case=False) else NA for x in df['CLN']]
df['Labels'] = ['Unlikely' if df['CLN'].str.contains("neurological", case=False) else NA for x in df['CLN']]
But I'm worried these would cancel each other out, overwriting NA over labels.
Upvotes: 0
Views: 918
Reputation: 150735
Use np.select
:
df['Labels'] = np.select((df['CLN'].str.contains("cardio", case=False),
df['CLN'].str.contains("neurological", case=False)),
('Probable', 'Unlikely'),
np.nan # or 'NA' if it fits you better
)
Upvotes: 3