Convert the last non-zero value to 0 for each row in a pandas DataFrame

Question

I'm trying to modify my data frame in a way that the last variable of a label encoded feature is converted to 0. For example, I have this data frame, top row being the labels and the first column as the index:

df
   1  2  3  4  5  6  7  8  9  10
0  0  1  0  0  0  0  0  0  1   1
1  0  0  0  1  0  0  0  0  0   0
2  0  0  0  0  0  0  0  0  1   0

Columns 1-10 are the ones that have been encoded. What I want to convert this data frame to, without changing anything else is:

   1  2  3  4  5  6  7  8  9  10
0  0  1  0  0  0  0  0  0  1   0
1  0  0  0  0  0  0  0  0  0   0
2  0  0  0  0  0  0  0  0  0   0

So the last values occurring in each row should be converted to 0. I was thinking of using the last_valid_index method, but that would take in the other remaining columns and change that as well, which I don't want. Any help is appreciated

cs95 · Accepted Answer

You can use cumsum to build a boolean mask, and set to zero.

v = df.cumsum(axis=1)
df[v.lt(v.max(axis=1), axis=0)].fillna(0, downcast='infer')

   1  2  3  4  5  6  7  8  9  10
0  0  1  0  0  0  0  0  0  1   0
1  0  0  0  0  0  0  0  0  0   0
2  0  0  0  0  0  0  0  0  0   0

Another similar option is reversing before calling cumsum, you can now do this in a single line.

df[~df.iloc[:, ::-1].cumsum(1).le(1)].fillna(0, downcast='infer')

   1  2  3  4  5  6  7  8  9  10
0  0  1  0  0  0  0  0  0  1   0
1  0  0  0  0  0  0  0  0  0   0
2  0  0  0  0  0  0  0  0  0   0

If you have more columns, just apply these operations on the slice. Later, assign back.

u = df.iloc[:, :10]
df[u.columns] = u[~u.iloc[:, ::-1].cumsum(1).le(1)].fillna(0, downcast='infer')

Convert the last non-zero value to 0 for each row in a pandas DataFrame

Answers (1)

Related Questions