DN1
DN1

Reputation: 218

How to label rows based on if statement in python?

I am trying to create a Labels column labelling my data, I've tried adapting answers from similar questions but gotten stuck with what I specifically need.

My data looks like this (unfortunately I can't provide a real example):

Gene   CLN
Gene1  cardiovascular
Gene2  Cardiovascular
Gene3  Neurological

Currently I am trying:

df['Labels'] = ['Probable' if df['CLN'].str.contains("cardio", case=False) else NA for x in df['CLN']]

The condition is to find all rows in the CLN column and label them if they have a partial string match for 'cardio', and ignore labeling/doing anything with any rows that don't match, but my code gives an error (ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().) - I am not confident with python so I'm not sure to do to address this?

Expected output:

Gene   CLN             Labels
Gene1  cardiovascular  Probable
Gene2  Cardiovascular  Probable
Gene3  Neurological    NA

The idea long term is I could write another line when I want to add more labels than just Probable too, so doing something like:

df['Labels'] = ['Probable' if df['CLN'].str.contains("cardio", case=False) else NA for x in df['CLN']]
df['Labels'] = ['Unlikely' if df['CLN'].str.contains("neurological", case=False) else NA for x in df['CLN']]

But I'm worried these would cancel each other out, overwriting NA over labels.

Upvotes: 0

Views: 918

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Use np.select:

df['Labels'] = np.select((df['CLN'].str.contains("cardio", case=False),
                          df['CLN'].str.contains("neurological", case=False)),
                         ('Probable', 'Unlikely'),
                         np.nan              # or 'NA' if it fits you better
                        )                  

Upvotes: 3

Related Questions