Reputation: 111
I have a dataframe with 11 columns: Status1-Status5, Time1-Time5 & Time_Min
df = pd.DataFrame([[100,200,150,400,500,'a','b','a','c','a',100], [300,400,200,500,250,'b','b','c','c','c',200]], columns=['TIME_1', 'TIME_2', 'TIME_3', 'TIME_4', 'TIME_5','STATUS_1','STATUS_2','STATUS_3','STATUS_4','STATUS_5','TIME_MIN'])
I would like to reproduce a code I have in SAS currently which does the following
IF TIME_1 = TIME_MIN THEN STATUS = STATUS_1;
ELSE IF TIME_2 = TIME_MIN THEN STATUS = STATUS_2;
ELSE IF TIME_3 = TIME_MIN THEN STATUS = STATUS_3;
ELSE IF TIME_4 = TIME_MIN THEN STATUS = STATUS_4;
ELSE STATUS = STATUS_5;
Expected output for column STATUS would be
['a','c']
I tried building something along these lines (which would need to be extended with else ifs)
df['STATUS'] = [a if x == y else b for x,y,a,b in df[['TIME_MIN','TIME_1','STATUS_1','STATUS_2']]]
But this just gives an error. I'm sure it's a simple fix, but I can't quite figure it out.
Upvotes: 8
Views: 26112
Reputation: 250
You may use conditions and choices
df = pd.DataFrame([[100,200,150,400,500,'a','b','a','c','a',100], [300,400,200,500,250,'b','b','c','c','c',200]], columns=['TIME_1', 'TIME_2', 'TIME_3', 'TIME_4', 'TIME_5','STATUS_1','STATUS_2','STATUS_3','STATUS_4','STATUS_5','TIME_MIN'])
condition= [df['TIME_1'] == df['TIME_MIN'],
df['TIME_2'] == df['TIME_MIN'],
df['TIME_3'] == df['TIME_MIN'],
df['TIME_4'] == df['TIME_MIN'],
df['TIME_4'] == df['TIME_MIN']]
choice= [df['STATUS_1'],df['STATUS_2'],df['STATUS_3'],df['STATUS_4'],df['STATUS_5']]
df['STATUS'] =np.select(condition,choice,default="")
col_required=['TIME_1','TIME_2','TIME_3','TIME_4','TIME_5','TIME_MIN','STATUS']
df=df[col_required]
df
output
TIME_1 TIME_2 TIME_3 TIME_4 TIME_5 TIME_MIN STATUS
0 100 200 150 400 500 100 a
1 300 400 200 500 250 200 c
Upvotes: 0
Reputation: 19114
Not very pretty but you can use equality broadcasting with the .eq
method.
m = df.iloc[:, :5].eq(df['TIME_MIN'], axis=0)
m.columns = 'STATUS_' + m.columns.str.extract('TIME_(.*)')
df['STATUS'] = df[m].bfill(axis=1).iloc[:, 0]
Upvotes: 2
Reputation: 38415
You can write a function
def get_status(df):
if df['TIME_1'] == df['TIME_MIN']:
return df['STATUS_1']
elif df['TIME_2'] == df['TIME_MIN']:
return df['STATUS_2']
elif df['TIME_3'] == df['TIME_MIN']:
return df['STATUS_3']
elif df['TIME_4'] == df['TIME_MIN']:
return df['STATUS_4']
else:
return df['STATUS_5']
df['STATUS'] = df.apply(get_status, axis = 1)
Or use a very-nested np.where,
df['STATUS'] = np.where(df['TIME_1'] == df['TIME_MIN'], df['STATUS_1'],\
np.where(df['TIME_2'] == df['TIME_MIN'], df['STATUS_2'],\
np.where(df['TIME_3'] == df['TIME_MIN'], df['STATUS_3'],\
np.where(df['TIME_4'] == df['TIME_MIN'], df['STATUS_4'], df['STATUS_5']))))
Upvotes: 12