Getting start and end indices of string in Pandas

Question

I have a df that looks like this:

|Index|Value|Anomaly|
---------------------
|0    |4    |       |
|1    |2    |Anomaly|
|2    |1    |Anomaly|
|3    |2    |       |
|4    |6    |Anomaly|

I want to get the start and end indices of the consecutive anomaly counts so in this case, it will be [[1,2],[4]]

I understand I have to use .shift and .cumsum but I am lost and I hope someone would be able to enlighten me.

ALollz · Accepted Answer

Get consecutive groups taking the cumsum of the Boolean Series that checks where the value is not 'Anomoly'. Use where so that we only only take the 'Anomoly' rows. Then we can loop over the groups and grab the indices.

m = df['Anomaly'].ne('Anomaly')

[[idx[0], idx[-1]] if len(idx) > 1 else [idx[0]] 
 for idx in df.groupby(m.cumsum().where(~m)).groups.values()]
#[[1, 2], [4]]

Or if you want to use a much longer groupby you can get the first and last index, then drop duplicates (to deal with streaks of only 1) and get it into a list of lists. This is much slower though

(df.reset_index().groupby(m.cumsum().where(~m))['index'].agg(['first', 'last'])
   .stack()
   .drop_duplicates()
   .groupby(level=0).agg(list)
   .tolist())
#[[1, 2], [4]]

Getting start and end indices of string in Pandas

Answers (1)

Related Questions