Reputation: 3
I have a dataset with some columns, I want to create another column, where values are the column name of the variable with the highest value BUT different from 1
For Example:
df = pd.DataFrame({'A': [1, 0.2, 0.1, 0],
'B': [0.2,1, 0, 0.5],
'C': [1, 0.4, 0.3, 1]},
index=['1', '2', '3', '4'])
df
index | A | B | C |
---|---|---|---|
1 | 1.0 | 0.2 | 1.0 |
2 | 0.2 | 1.0 | 0.4 |
3 | 0.1 | 0.0 | 0.3 |
4 | 0.0 | 0.5 | 1.0 |
Should give an output like
index | A | B | C | NEWCOL |
---|---|---|---|---|
1 | 1.0 | 0.2 | 1.0 | B |
2 | 0.2 | 0.3 | 0.1 | C |
3 | 0.1 | 0.4 | 0.2 | B |
4 | 0.0 | 0.5 | 1.0 | B |
df2['newcol'] = df2.idxmax(axis=1) if df2.max(index=1) != 1
but didn't work
Upvotes: 0
Views: 32
Reputation: 11650
here is one way to do it
# filter out the data that is 1 and find the id of the max value using idxmax
df['newcol']=df[~df.isin([1])].idxmax(axis=1)
df
A B C newcol
1 1.0 0.2 1.0 B
2 0.2 1.0 0.4 C
3 0.1 0.0 0.3 C
4 0.0 0.5 1.0 B
PS: your input, starting and expected data don't match. The above is based on the input DF
Upvotes: 0