Reputation: 474
I am new to python (coming from R) and I cannot figure out how to iterate over a data frame in python. I have provided a data frame below and a list of possible "Interventions". What I am attempting to do is search through the "Intervention" column in the data frame and if the intervention is in the "intervention_list" replace the value to "Yes Intervention" but if "NaN" replace with "No Intervention".
Any guidance or help would be appreciated.
import pandas as pd
intervention_list = ['Intervention 1', 'Intervention 2']
df = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
'Intervention':['Intervention 1', 'NaN','NaN','NaN','Intervention 2','Intervention 1','NaN']})
print(df)
I am hoping the finished data frame would look like this:
df_new = pd.DataFrame({'ID':[100,200,300,400,500,600,700],
'Intervention':['Yes Intervention', 'No Intervention','No Intervention','No Intervention','Yes Intervention','Yes Intervention','No Intervention']})
print(df_new)
Thank you!
Upvotes: 1
Views: 564
Reputation: 862511
In pandas is best avoid loops, because slow, so use numpy.where
with test missing values by Series.isna
or
Series.notna
for vectorized solution:
df['Intervention'] = np.where(df['Intervention'].isna(),'No Intervention','Yes Intervention')
Or:
df['Intervention'] = np.where(df['Intervention'].notna(),'Yes Intervention','No Intervention')
If NaN
is string then test by ==
or Series.eq
:
df['Intervention']=np.where(df['Intervention'].eq('NaN'),'No Intervention','Yes Intervention')
But if need also test in list use numpy.select
:
m1 = df['Intervention'].isin(intervention_list)
m2 = df['Intervention'].isna()
#if not match m1 or m2 create default None
df['Intervention'] = np.select([m1, m2],
['Yes Intervention','No Intervention'],
default=None)
#if not match m1 or m2 set original value column Intervention
df['Intervention'] = np.select([m1, m2],
['Yes Intervention','No Intervention'],
default=df['Intervention'])
Upvotes: 1