Karthik Venkatraman
Karthik Venkatraman

Reputation: 1657

Calculate number of event occurences before a particular event and number of specific event in pandas dataframe

I am working on a data in which i need to specifically calculate the number of other events occurred before a particular event. The sample data is below

Sample

In the above image there are two PUMP RUN. So what I need to derive is the cumulative number of other events before each PUMP RUN and store in a new column for each ID value. There would be multiple ID value with multiple events along with PUMP RUN event. For example, based on the above image for First occurrence of pump run within that ID value it has to be 5 and for the second occurrence it has to be 3 and so on. Once the ID value changes, the cumulative count needs to restart.

Desired Output

Desired Output

I tried to do this with the below code however its not getting grouped or partitioned based on the ID value.

data['Count'] = data.groupby(data['EventDescription'].str.startswith('Pump RUN').cumsum()).cumcount().add(1)
data['EventsCount'] = data.groupby('ID')['Count'].apply(lambda x: x.shift(1))

Secondly I need to find how many events had the PUMP RUN for each ID value. I was able to find this out with the below code

data['PumpRun'] = np.where(data['EventDescription'].str.startswith('Pump RUN'), 1, 0)
data['RunCount'] = data.groupby(['ID']).PumpRun.transform('sum')

However I want to know whether there is a simple way to achieve this instead of creating multiple columns for the calculation.

Below is my full code

data = pd.read_csv('eventdata.csv')
data.head()
data = data.sort_values(by=["SiteNumber", "SerialNumber","ID","Time"] , ascending = True)
data['Count'] = data.groupby(data['EventDescription'].str.startswith('Pump RUN').cumsum()).cumcount().add(1)
data['EventsCount'] = data.groupby('ID')['Count'].apply(lambda x: x.shift(1))
data['EventsCount'] = data['EventsCount'].fillna(0)
data.drop(data[data['ID'].eq('')].index, inplace = True)
data['PumpRun'] = np.where(data['EventDescription'].str.startswith('Pump RUN'), 1, 0)
data['RunCount'] = data.groupby(['ID']).PumpRun.transform('sum')
data.to_csv("events_count.csv")

Below is the link for the data.

Data

Upvotes: 0

Views: 102

Answers (1)

sitting_duck
sitting_duck

Reputation: 3720

This may at least answer the first part of your question. The idea is to identify the Pump RUN groups with a cumsum on a shifted boolean mask. Then groupby on that and the ID. From there use transform to create the resulting EventCount sub-column (grouped column). Note the reset across ID boundaries.

Updated to count the events after Pump Run and before the next stop

target_event_phrases = 'Pump RUN|Pump STOP|User STOP|Infusion Complete|Bolus complete'

pump_grp = df['EventDescription'].str.contains(target_event_phrases).cumsum()

dfc = df.assign(EventCount=df.groupby(['ID',pump_grp])['EventDescription'].transform(
           lambda x: ([len(x)-1] if 'Pump RUN' in x.iat[0] else [0]) + [0]*(len(x)-1)
           )
      )

The following is used only to pull out and display sample Run/Stop groups

g = random.choice(pump_grp.unique())
print(f'{g=}')

for i in range(g-2,g+3):
    print(dfc.loc[pg == i,['ID','EventDescription','EventCount']])
    print('--------------------------------------------')

Dataframe Sample Subset

g=126
             ID                       EventDescription  EventCount
469  5983688711                               Pump RUN           3
470  5983688711  Pump rate updated:; - Rate 60.1 mL/hr           0
471  5983688711  Pump rate updated:; - Rate 50.1 mL/hr           0
472  5983688711  Pump rate updated:; - Rate 40.1 mL/hr           0
--------------------------------------------
             ID       EventDescription  EventCount
473  5983688711              Pump STOP           0
474  5983688711  Downstream Occlusion!           0
475  5983688711  Downstream Occl Clr'd           0
--------------------------------------------
             ID                       EventDescription  EventCount
476  5983688711                Pump RUN - Auto-restart           3
477  5983688711  Pump rate updated:; - Rate 35.1 mL/hr           0
478  5983688711  Pump rate updated:; - Rate 30.1 mL/hr           0
479  5983688711  Pump rate updated:; - Rate 25.1 mL/hr           0
--------------------------------------------
             ID   EventDescription  EventCount
480  5983688711  Infusion Complete           0
--------------------------------------------
             ID                    EventDescription  EventCount
481  5983688711                      Pump RUN - KVO           1
482  5983688711  Pump rate updated:; - Rate 1 mL/hr           0
--------------------------------------------

Upvotes: 1

Related Questions