Reputation: 1
Currently I'm returning column name of the max value in the each row.
df['Active'] = df.idxmax(axis=1)
How do I take into account the Priority for each column? e.g. for Row 0, the Active column should have opC since it has a higher priority than opA. (Also Priority row shouldn't return anything in the Active column).
Update: Follow up scenario. Adding an additional row called 'minOccurrence'. Here's an example of it. Since opD doesn't have 3 straight "Actives" it isn't active at index 1 or 2 where previously it was Active based on 'Priority' column only.
df1 = pd.DataFrame({'opA': [1,1,1,1,0],
'opB': [1,1,1,0,1],
'opC': [1,1,1,1,2],
'opD': [0,1,1,0,3],
'Active': ['opC','opD', 'opD', 'opC', 0]})
df1 = df1.rename(index={df1.last_valid_index() : 'Priority'})
df1.loc['Priority','Active'] = ''
print(df1)
df1 = pd.DataFrame({'opA': [1,1,1,1,0,0],
'opB': [1,1,1,0,1,0],
'opC': [1,1,1,1,2,0],
'opD': [0,1,1,0,3,3],
'Active': ['opC','opC', 'opC', 'opC', 0,0]})
df1 = df1.rename(index={df1.last_valid_index() - 1 : 'Priority'})
df1 = df1.rename(index={df1.last_valid_index() : 'minOccurrence'})
df1.loc['Priority','Active'] = ''
df1.loc['minOccurrence','Active'] = ''
print(df1)
vs. if opD had a 1 at index 0.
df1 = pd.DataFrame({'opA': [1,1,1,1,0,0],
'opB': [1,1,1,0,1,0],
'opC': [1,1,1,1,2,0],
'opD': [1,1,1,0,3,3],
'Active': ['opD','opD', 'opD', 'opC', 0,0]})
df1 = df1.rename(index={df1.last_valid_index() - 1 : 'Priority'})
df1 = df1.rename(index={df1.last_valid_index() : 'minOccurrence'})
df1.loc['Priority','Active'] = ''
df1.loc['minOccurrence','Active'] = ''
print(df1)
Upvotes: 0
Views: 352
Reputation: 260420
You can do everything in a single shot using indexing.
Using multiplication by the priority as suggested by @Meelad:
df['Active'] = (df
.loc[df.index!='Priority']
.mul(df.loc['Priority'])
.idxmax(1)
)
Or by sorting the columns as suggested by @Arnau:
df['Active'] = (df
.loc[df.index!='Priority']
.sort_index(axis=1, key=lambda x: -df.loc['Priority',x])
.idxmax(1)
)
Reproducible input:
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0,2,(13,4)),
columns=['opA', 'opB', 'opC', 'opD'])
df.loc['Priority'] = range(4)
Output:
opA opB opC opD Active
0 0 1 1 0 opC
1 1 1 1 1 opD
2 1 1 1 0 opC
3 0 1 0 0 opB
4 0 0 0 1 opD
5 0 1 1 0 opC
6 0 1 1 1 opD
7 1 0 1 0 opC
8 1 0 1 1 opD
9 0 1 1 0 opC
10 0 1 0 1 opD
11 1 1 1 1 opD
Priority 0 1 2 3 NaN
Upvotes: 0
Reputation: 36
multiply column index by row column value , then pick up maximum result and sum all the row values , put it in new column , sort column.
Upvotes: 0
Reputation: 741
You need to resort the columns before using idxmax
temp_cols = df.columns
df = df.sort_index(axis=1,key=lambda x:df.loc['Priority',x],ascending=False)
df['Active'] = df.idxmax(axis=1)
df = df[list(temp_cols)+['Active']]
df.loc['Priority','Active'] = ''
Upvotes: 1