Jake
Jake

Reputation: 474

Iterating over a data frame and replacing value depending on a condition

I am new to python (coming from R) and I cannot figure out how to iterate over a data frame in python. I have provided a data frame below and a list of possible "Interventions". What I am attempting to do is search through the "Intervention" column in the data frame and if the intervention is in the "intervention_list" replace the value to "Yes Intervention" but if "NaN" replace with "No Intervention".

Any guidance or help would be appreciated.

import pandas as pd
intervention_list = ['Intervention 1', 'Intervention 2']
df = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
                  'Intervention':['Intervention 1', 'NaN','NaN','NaN','Intervention 2','Intervention 1','NaN']})
print(df)

I am hoping the finished data frame would look like this:

df_new = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
                  'Intervention':['Yes Intervention', 'No Intervention','No Intervention','No Intervention','Yes Intervention','Yes Intervention','No Intervention']})
print(df_new)

Thank you!

Upvotes: 1

Views: 564

Answers (1)

jezrael
jezrael

Reputation: 862511

In pandas is best avoid loops, because slow, so use numpy.where with test missing values by Series.isna or Series.notna for vectorized solution:

df['Intervention'] = np.where(df['Intervention'].isna(),'No Intervention','Yes Intervention')

Or:

df['Intervention'] = np.where(df['Intervention'].notna(),'Yes Intervention','No Intervention')

If NaN is string then test by == or Series.eq:

df['Intervention']=np.where(df['Intervention'].eq('NaN'),'No Intervention','Yes Intervention')

But if need also test in list use numpy.select:

m1 = df['Intervention'].isin(intervention_list)
m2 = df['Intervention'].isna()

#if not match m1 or m2 create default None
df['Intervention'] = np.select([m1, m2],
                              ['Yes Intervention','No Intervention'],
                              default=None)

#if not match m1 or m2 set original value column Intervention
df['Intervention'] = np.select([m1, m2],
                              ['Yes Intervention','No Intervention'],
                              default=df['Intervention'])

Upvotes: 1

Related Questions