Reputation: 573
Sorry if the question is not clear, let me describe my issue in this post. I have the following dataframe:
value created_at t_diff flag_1
0 18.930542 2019-03-03 21:43:08-05:00 00:00:00 1
1 18.895210 2019-03-03 21:44:09-05:00 00:00:00 1
2 18.895210 2019-03-03 21:45:09-05:00 00:00:00 1
3 18.885010 2019-03-03 21:46:10-05:00 00:04:04 2
4 0.000000 2019-03-03 21:47:11-05:00 00:04:04 2
5 0.000000 2019-03-03 21:48:12-05:00 00:04:04 2
6 0.000000 2019-03-03 21:49:13-05:00 00:04:04 2
7 0.000000 2019-03-03 21:50:14-05:00 00:04:04 2
8 18.857025 2019-03-03 21:51:14-05:00 00:00:00 3
9 18.847290 2019-03-03 21:52:15-05:00 00:00:00 3
10 18.847290 2019-03-03 21:53:17-05:00 00:00:00 3
11 18.873283 2019-03-03 21:54:17-05:00 00:00:00 3
12 18.873283 2019-03-03 21:55:19-05:00 00:00:00 3
13 18.837677 2019-03-03 21:56:19-05:00 00:00:00 3
20 18.830170 2019-03-03 22:03:25-05:00 00:00:00 5
21 18.826149 2019-03-03 22:04:26-05:00 00:00:00 5
22 18.826149 2019-03-03 22:05:27-05:00 00:00:00 5
23 18.830795 2019-03-03 22:06:28-05:00 00:00:00 5
From the column 'flag_1', I'd like to identify the elements that, despite being repeated, form a succession of consecutive numbers. The outcome I desire is like the following
value created_at t_diff flag_1 flag_2
0 18.930542 2019-03-03 21:43:08-05:00 00:00:00 1 1
1 18.895210 2019-03-03 21:44:09-05:00 00:00:00 1 1
2 18.895210 2019-03-03 21:45:09-05:00 00:00:00 1 1
3 18.885010 2019-03-03 21:46:10-05:00 00:04:04 2 1
4 0.000000 2019-03-03 21:47:11-05:00 00:04:04 2 1
5 0.000000 2019-03-03 21:48:12-05:00 00:04:04 2 1
6 0.000000 2019-03-03 21:49:13-05:00 00:04:04 2 1
7 0.000000 2019-03-03 21:50:14-05:00 00:04:04 2 1
8 18.857025 2019-03-03 21:51:14-05:00 00:00:00 3 1
9 18.847290 2019-03-03 21:52:15-05:00 00:00:00 3 1
10 18.847290 2019-03-03 21:53:17-05:00 00:00:00 3 1
11 18.873283 2019-03-03 21:54:17-05:00 00:00:00 3 1
12 18.873283 2019-03-03 21:55:19-05:00 00:00:00 3 1
13 18.837677 2019-03-03 21:56:19-05:00 00:00:00 3 1
20 18.830170 2019-03-03 22:03:25-05:00 00:00:00 5 2
21 18.826149 2019-03-03 22:04:26-05:00 00:00:00 5 2
22 18.826149 2019-03-03 22:05:27-05:00 00:00:00 5 2
23 18.830795 2019-03-03 22:06:28-05:00 00:00:00 5 2
Column named 'flag_2' should be populated with a numeric identifier each time of these "successions" from consecutive repeated numbers occur. 1 for the first, 2 for the second, 3 for the third and so on.
I have been trying to do this indirectly, using df.flag_1.unique() and then with the help of more-itertools created a nested list which I would loop over, slicing the dataframe using isin from Pandas.
I'd like to know if there's a way to do all this with Pandas and without using more-itertools and the rest of my approach.
Can you help me out please? Thanks in advance!
Upvotes: 0
Views: 593
Reputation: 323226
You can create it by using diff
and cumsum
, logic here is continue value the different should not greater than 1 , in your example, every time it will increase by one or maintain the same (no change so the different should be 0 )
df.flag_1.diff().gt(1).cumsum()+1
Out[351]:
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 1
13 1
20 2
21 2
22 2
23 2
Name: flag_1, dtype: int32
Upvotes: 1