Reputation: 708
I'm looking to understand the number of times we are in an 'Abnormal State' before we have an 'Event'. My objective is to modify my dataframe to get the following output where everytime we reach an 'event', the 'Abnormal State Grouping' resets to count from 0.
We can go through a number of 'Abnormal States' before we reach an 'Event', which is deemed a failure. (i.e. The lightbulb is switched on and off for several periods before it finally shorts out resulting in an event).
I've written the following code to get my AbnormalStateGroupings to increment into relevant groupings for my analysis which has worked fine. However, we want to 'reset' the count of our 'AbnormalStates' after each event (i.e. lightbulb failure):
dataframe['AbnormalStateGrouping'] = (dataframe['AbnormalState']!=dataframe['AbnormalState'].shift()).cumsum()
I have created an additional column which let's me know what 'event' we are at via:
dataframe['Event_Or_Not'].cumsum() #I have a boolean representation of the Event Column represented and we use .cumsum() to get the relevant groupings (i.e. 1st Event, 2nd Event, 3rd Event etc.)
I've come close previously using the following:
eventOrNot = dataframe['Event'].eq(0)
eventMask = (eventOrNot.ne(eventOrNot.shift())&eventOrNot).cumsum()
dataframe['AbnormalStatePerEvent'] =dataframe.groupby(['Event',eventMask]).cumcount().add(1)
However, this hasn't given me the desired output that I'm after (as per below). I think I'm close however - Could anyone please advise what I could try to do next so that for each lightbulb failure, the abnormal state count resets and starts counting the # of abnormal states we have gone through before the next lightbulb failure?
State I want to get to with AbnormalStateGrouping
You would note that when an 'Event' is detected, the Abnormal State count resets to 1 and then starts counting again.
Current State of Dataframe
Please find an attached data source below: https://filebin.net/ctjwk7p3gulmbgkn
Upvotes: 1
Views: 218
Reputation: 30971
I assume that your source DataFrame has only Date/Time (either string or datetime), Event (string) and AbnormalState (int) columns.
To compute your grouping column, run:
dataframe['AbnormalStateGrouping'] = dataframe.groupby(
dataframe['Event'][::-1].notnull().cumsum()).AbnormalState\
.apply(lambda grp: (grp != grp.shift()).cumsum())
The result, for your initial source data, included as a picture, is:
Date/Time Event AbnormalState AbnormalStateGrouping
0 2018-01-01 01:00 NaN 0 1
1 2018-01-01 02:00 NaN 0 1
2 2018-01-01 03:00 NaN 1 2
3 2018-01-01 04:00 NaN 1 2
4 2018-01-01 05:00 NaN 0 3
5 2018-01-01 06:00 NaN 0 3
6 2018-01-01 07:00 NaN 0 3
7 2018-01-01 08:00 NaN 1 4
8 2018-01-01 09:00 NaN 1 4
9 2018-01-01 10:00 NaN 0 5
10 2018-01-01 11:00 NaN 0 5
11 2018-01-01 12:00 NaN 0 5
12 2018-01-01 13:00 NaN 1 6
13 2018-01-01 14:00 NaN 1 6
14 2018-01-01 15:00 NaN 0 7
15 2018-01-01 16:00 Event 0 7
16 2018-01-01 17:00 NaN 1 1
17 2018-01-01 18:00 NaN 1 1
18 2018-01-01 19:00 NaN 0 2
19 2018-01-01 20:00 NaN 0 2
Note the way of grouping:
dataframe['Event'][::-1].notnull().cumsum()
Due to [::-1], cumsum function is computed from the last row to the first. Thus:
Then, to AbnormalState, separately for each group, a lambda function is applied, so each cumulative sum starts from 1 just in each group (after each Event).
The reason why I compute the cumsum for grouping in reversed order is that when you run it in normal order:
dataframe['Event'].notnull().cumsum()
then:
Try yourself both versions, without and with [::-1].
The result in normal order (without [::-1]) is that:
To check the whole result, run my code without [::-1] and you will see that the ending part of the result contains:
Date/Time Event AbnormalState AbnormalStateGrouping
14 2018-01-01 15:00:00 NaN 0 7
15 2018-01-01 16:00:00 Event 0 1
16 2018-01-01 17:00:00 NaN 1 2
17 2018-01-01 18:00:00 NaN 1 2
18 2018-01-01 19:00:00 NaN 0 3
19 2018-01-01 20:00:00 NaN 0 3
so that the Event row to has AbnormalStateGrouping == 1.
But you want this row to have AbnormalStateGrouping in a sequence of previous grouping states (in this case 7) and reset should occur from the next row on.
So the Event row should be in same group with preceding rows, what is the result of my code.
Upvotes: 2