Reputation: 2181
Original dataframe:
ix x y z
0 3 4 1
1 2 0 6
2 7 1 0
3 0 0 0
Should transform into:
ix x y z
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
As you can see, i'm taking the max value in each row and setting that equal to 1 then the other values in that row will be equal to 0. Also, you'll notice that row 3
stays the same since they are all equal to 0.
So, I've been able to extract the index of the max value using:
x.idxmax(axis = 1)
But i'm not sure what to do with the max indices. I'm thinking to use np.where but there isn't a conditional statement I can use. Or so I think.
Any help would be much appreciated.
Upvotes: 4
Views: 2700
Reputation: 153470
Use:
df.eq(df.where(df != 0).max(1), axis=0).astype(int)
where df,
x y z
ix
0 3.0 4.0 1.0
1 2.0 1.0 6.0
2 7.0 1.0 6.0
3 0.0 0.0 0.0
4 4.0 0.0 4.0
Output:
x y z
ix
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 1 0 1
Another method use rank
:
df.where(df!=0).rank(1, ascending=False, method='dense').eq(1).astype(int)
Output:
x y z
ix
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
4 1 0 1
Upvotes: 3
Reputation: 476813
A rather inelegant way to do it is the following:
(df.T.max() == df.T).T.astype(int)
Here we calculate the rowwise maximum, and then compare it with the values (to set these to True
/False
), next we convert it to int
s.
This generates:
>>> (df.T.max() == df.T).T.astype(int)
a b c
0 0 1 0
1 0 0 1
2 1 0 0
The .T
is necessary, since this will otherwise calculate the columnwise maximum.
Or like @AChampion says, we can calculate the rowwise maximum with .max(axis=1)
and then use df.eq(..)
to calculate the equality rowwise as well. Like:
>>> df.eq(df.max(axis=1), axis=0).astype(int)
a b c
0 0 1 0
1 0 0 1
2 1 0 0
EDIT: updating only non-zero rows
We can for example use masking to prevent assigning such values to zero-rows. For example:
fl = (df != 0).any(axis=1)
df[fl] = df[fl].eq(df[fl].max(axis=1), axis=0).astype(int)
For example:
>>> df = pd.DataFrame([[3, 4, 1], [2, 1, 6], [7, 1, 6], [0, 0, 0]], columns=["a", "b", "c"])
>>> fl = (df != 0).any(axis=1)
>>> df[fl] = df[fl].eq(df[fl].max(axis=1), axis=0).astype(int)
>>> df
a b c
0 0 1 0
1 0 0 1
2 1 0 0
3 0 0 0
Upvotes: 1
Reputation: 57033
First, locate the part of the dataframe that has non-zero rows. Then find the maximal values and compare them to the matrix:
affected = (df != 0).any(axis=1)
nz = df[affected]
df[affected] = (nz.T == nz.max(axis=1)).T.astype(int)
# x y z
#0 0 1 0
#1 0 0 1
#2 1 0 0
#3 0 0 0
Upvotes: 4