Reputation: 1571
I want to do in Python something very similar as this question from this one R users. My intention is to create a new column that its values are created based on conditions from other columns
For example:
d = {'year': [2010, 2011,2013, 2014], 'PD': [0.5, 0.8, 0.9, np.nan], 'PD_thresh': [0.7, 0.8, 0.9, 0.7]}
df_temp = pd.DataFrame(data=d)
Now I want to create a condition that says:
pseudo-code:
if for year X the value of PD is greater or equal to the value of PD_thresh
then set 0 in a new column y_pseudo
otherwise set 1
My expected outcome is this:
df_temp
Out[57]:
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.7 0.0
2 2013 0.9 0.8 1.0
3 2014 NaN 0.7 NaN
Upvotes: 2
Views: 610
Reputation: 86
Your data d is different from your outcome, and I think you meant 1 if greater than the threshold, not the other way around, so I have this:
y = [a if np.isnan(a) else 1 if a>=b else 0 for a,b in zip(df_temp.PD,df_temp.PD_thresh)]
df_temp['y_pseudo'] = y
Output:
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.8 0.8 1.0
2 2013 0.9 0.9 1.0
3 2014 NaN 0.7 NaN
Upvotes: 1
Reputation: 863531
Use numpy.select
with isna
and ge
:
m1 = df_temp['PD'].isna()
m2 = df_temp['PD'].ge(df_temp['PD_thresh'])
df_temp['y_pseudo'] = np.select([m1, m2], [np.nan, 1], default=0)
print (df_temp)
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.8 0.0
2 2013 0.9 0.9 1.0
3 2014 NaN 0.7 NaN
Another solution is convert mask to integer for True/False
to 1/0
mapping and set only non missing rows by notna
:
m2 = df_temp['PD'].ge(df_temp['PD_thresh'])
m3 = df_temp['PD'].notna()
df_temp.loc[m3, 'y_pseudo'] = m2[m3].astype(int)
print (df_temp)
year PD PD_thresh y_pseudo
0 2010 0.5 0.7 0.0
1 2011 0.6 0.8 0.0
2 2013 0.9 0.9 1.0
3 2014 NaN 0.7 NaN
Upvotes: 4