Reputation: 77
I have a df that looks like this:
|Index|Value|Anomaly|
---------------------
|0 |4 | |
|1 |2 |Anomaly|
|2 |1 |Anomaly|
|3 |2 | |
|4 |6 |Anomaly|
I want to get the start and end indices of the consecutive anomaly counts so in this case, it will be [[1,2],[4]]
I understand I have to use .shift
and .cumsum
but I am lost and I hope someone would be able to enlighten me.
Upvotes: 1
Views: 76
Reputation: 59579
Get consecutive groups taking the cumsum of the Boolean Series that checks where the value is not 'Anomoly'. Use where
so that we only only take the 'Anomoly' rows. Then we can loop over the groups and grab the indices.
m = df['Anomaly'].ne('Anomaly')
[[idx[0], idx[-1]] if len(idx) > 1 else [idx[0]]
for idx in df.groupby(m.cumsum().where(~m)).groups.values()]
#[[1, 2], [4]]
Or if you want to use a much longer groupby
you can get the first and last index, then drop duplicates (to deal with streaks of only 1) and get it into a list of lists. This is much slower though
(df.reset_index().groupby(m.cumsum().where(~m))['index'].agg(['first', 'last'])
.stack()
.drop_duplicates()
.groupby(level=0).agg(list)
.tolist())
#[[1, 2], [4]]
Upvotes: 2