Alex P
Alex P

Reputation: 137

Python, pandas find the length of each row based on a stop criteria

I have a dataset where the -1 means the point where I need to stop reading. For example the dataset looks like that:

          0   1   2    3    4  5
0       58  68  58   59   -1 -1
1       59  69  59   -1   -1 -1
2       93  94  93   33   -1 -1
3       58  59  58   68   -1 -1
4       92  94  92   33   -1 -1

where the -1 at column 4 means stop reading elements.

More precisely, I want to return the length per row (how many elements exist until -1 appears), for all rows.

So first row has length of 4 (4 elements until the -1). Row two has length 3. Row three has length 4 and so on.

For doing that I think I need in pandas perhaps a way to get per row the index where the first -1 occurs.

How I can do something like that in a nice way in pandas (so avoid the long for loop option?)

I would like to thank you in advance for your help. Regards Alex

Upvotes: 1

Views: 531

Answers (2)

anky
anky

Reputation: 75100

Another approach involves subtracting sum of True from the df.shape[1] :

df.assign(new=df.shape[1]-df.eq(-1).sum(axis=1))

    0   1   2   3  4  5  new
0  58  68  58  59 -1 -1    4
1  59  69  59  -1 -1 -1    3
2  93  94  93  33 -1 -1    4
3  58  59  58  68 -1 -1    4
4  92  94  92  33 -1 -1    4

Upvotes: 2

cs95
cs95

Reputation: 402844

numpy.argmax

You can compare DataFrame values to -1, then use numpy.argmax to get the number of non-zero values per row.

(df.values == -1).argmax(axis=1)
# array([4, 3, 4, 4, 4])

Assigning this back,

df['num_vals'] = (df.values == -1).argmax(axis=1))
df

    0   1   2   3  4  5  num_vals
0  58  68  58  59 -1 -1         4
1  59  69  59  -1 -1 -1         3
2  93  94  93  33 -1 -1         4
3  58  59  58  68 -1 -1         4
4  92  94  92  33 -1 -1         4

This works by always returning the column index of the first -1 found in each row, so is more general.


.cumsum(axis=1) + .max(axis=1)

df['num_vals'] = (df != -1).cumsum(axis=1).max(axis=1)
df

    0   1   2   3  4  5  num_vals
0  58  68  58  59 -1 -1         4
1  59  69  59  -1 -1 -1         3
2  93  94  93  33 -1 -1         4
3  58  59  58  68 -1 -1         4
4  92  94  92  33 -1 -1         4

This assumes your -1 values are at the end of their respective rows always.

Upvotes: 2

Related Questions