Reputation: 9451
Consider a dataframe with a column like this:
sequence
1
2
3
4
5
1
2
3
1
2
3
4
5
6
7
I wish to create a column when the sequence resets. The sequence is of variable length.
Such that I'd get something like:
sequence run
1 1
2 1
3 1
4 1
5 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
5 3
6 3
7 3
Upvotes: 0
Views: 141
Reputation: 1
Use:
dataset['run'] = dataset.groupby('sequence ').cumcount().add(1)
output example:
sequence run
y 1
a 1
g 1
a 2
b 1
a 3
b 2
Upvotes: 0
Reputation: 323376
Try with diff
then cumsum
df['run'] = df['sequence'].diff().ne(1).cumsum()
Out[349]:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
10 3
11 3
12 3
13 3
14 3
Name: sequence, dtype: int32
Upvotes: 1