madsthaks
madsthaks

Reputation: 2181

How do I set the max value if dataframe row to 1 and the rest of the values to 0

Original dataframe:

ix x  y  z    
0  3  4  1 
1  2  0  6
2  7  1  0
3  0  0  0

Should transform into:

ix x  y  z    
0  0  1  0 
1  0  0  1
2  1  0  0
3  0  0  0

As you can see, i'm taking the max value in each row and setting that equal to 1 then the other values in that row will be equal to 0. Also, you'll notice that row 3 stays the same since they are all equal to 0.

So, I've been able to extract the index of the max value using:

x.idxmax(axis = 1)

But i'm not sure what to do with the max indices. I'm thinking to use np.where but there isn't a conditional statement I can use. Or so I think.

Any help would be much appreciated.

Upvotes: 4

Views: 2700

Answers (3)

Scott Boston
Scott Boston

Reputation: 153470

Use:

df.eq(df.where(df != 0).max(1), axis=0).astype(int)

where df,

      x    y    z
ix               
0   3.0  4.0  1.0
1   2.0  1.0  6.0
2   7.0  1.0  6.0
3   0.0  0.0  0.0
4   4.0  0.0  4.0

Output:

    x  y  z
ix         
0   0  1  0
1   0  0  1
2   1  0  0
3   0  0  0
4   1  0  1

Another method use rank:

df.where(df!=0).rank(1, ascending=False, method='dense').eq(1).astype(int)

Output:

    x  y  z
ix         
0   0  1  0
1   0  0  1
2   1  0  0
3   0  0  0
4   1  0  1

Upvotes: 3

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476813

A rather inelegant way to do it is the following:

(df.T.max() == df.T).T.astype(int)

Here we calculate the rowwise maximum, and then compare it with the values (to set these to True/False), next we convert it to ints.

This generates:

>>> (df.T.max() == df.T).T.astype(int)
   a  b  c
0  0  1  0
1  0  0  1
2  1  0  0

The .T is necessary, since this will otherwise calculate the columnwise maximum.

Or like @AChampion says, we can calculate the rowwise maximum with .max(axis=1) and then use df.eq(..) to calculate the equality rowwise as well. Like:

>>> df.eq(df.max(axis=1), axis=0).astype(int)
   a  b  c
0  0  1  0
1  0  0  1
2  1  0  0

EDIT: updating only non-zero rows

We can for example use masking to prevent assigning such values to zero-rows. For example:

fl = (df != 0).any(axis=1)
df[fl] = df[fl].eq(df[fl].max(axis=1), axis=0).astype(int)

For example:

>>> df = pd.DataFrame([[3, 4, 1], [2, 1, 6], [7, 1, 6], [0, 0, 0]], columns=["a", "b", "c"])
>>> fl = (df != 0).any(axis=1)
>>> df[fl] = df[fl].eq(df[fl].max(axis=1), axis=0).astype(int)
>>> df
   a  b  c
0  0  1  0
1  0  0  1
2  1  0  0
3  0  0  0

Upvotes: 1

DYZ
DYZ

Reputation: 57033

First, locate the part of the dataframe that has non-zero rows. Then find the maximal values and compare them to the matrix:

affected = (df != 0).any(axis=1)
nz = df[affected]
df[affected] = (nz.T == nz.max(axis=1)).T.astype(int)
#    x  y  z
#0   0  1  0
#1   0  0  1
#2   1  0  0
#3   0  0  0

Upvotes: 4

Related Questions