Reputation: 137
I have a dataset where the -1 means the point where I need to stop reading. For example the dataset looks like that:
0 1 2 3 4 5
0 58 68 58 59 -1 -1
1 59 69 59 -1 -1 -1
2 93 94 93 33 -1 -1
3 58 59 58 68 -1 -1
4 92 94 92 33 -1 -1
where the -1 at column 4 means stop reading elements.
More precisely, I want to return the length per row (how many elements exist until -1 appears), for all rows.
So first row has length of 4 (4 elements until the -1). Row two has length 3. Row three has length 4 and so on.
For doing that I think I need in pandas perhaps a way to get per row the index where the first -1 occurs.
How I can do something like that in a nice way in pandas (so avoid the long for loop option?)
I would like to thank you in advance for your help. Regards Alex
Upvotes: 1
Views: 531
Reputation: 75100
Another approach involves subtracting sum of True
from the df.shape[1]
:
df.assign(new=df.shape[1]-df.eq(-1).sum(axis=1))
0 1 2 3 4 5 new
0 58 68 58 59 -1 -1 4
1 59 69 59 -1 -1 -1 3
2 93 94 93 33 -1 -1 4
3 58 59 58 68 -1 -1 4
4 92 94 92 33 -1 -1 4
Upvotes: 2
Reputation: 402844
numpy.argmax
You can compare DataFrame values to -1, then use numpy.argmax
to get the number of non-zero values per row.
(df.values == -1).argmax(axis=1)
# array([4, 3, 4, 4, 4])
Assigning this back,
df['num_vals'] = (df.values == -1).argmax(axis=1))
df
0 1 2 3 4 5 num_vals
0 58 68 58 59 -1 -1 4
1 59 69 59 -1 -1 -1 3
2 93 94 93 33 -1 -1 4
3 58 59 58 68 -1 -1 4
4 92 94 92 33 -1 -1 4
This works by always returning the column index of the first -1 found in each row, so is more general.
.cumsum(axis=1)
+ .max(axis=1)
df['num_vals'] = (df != -1).cumsum(axis=1).max(axis=1)
df
0 1 2 3 4 5 num_vals
0 58 68 58 59 -1 -1 4
1 59 69 59 -1 -1 -1 3
2 93 94 93 33 -1 -1 4
3 58 59 58 68 -1 -1 4
4 92 94 92 33 -1 -1 4
This assumes your -1 values are at the end of their respective rows always.
Upvotes: 2