user3356423
user3356423

Reputation: 191

a solution for counting sequential events in pandas

I have the following example data, and I'd like to count action(page_number=2) and action(page_number=3) one time for each try if it is available. I have attached the result I mean. I want to do using pandas dataframe but I don't know how it is.

import pandas as pd
df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'], 
                   'page_number': [0, 1, 2, 3]})

For example, we have this data

action     page_number
enter      
next        1
prev        2
next        1
next        2
exit        3
enter       
next        1
error       
next        1
error
error
error
next        2
prev        3
prev        2
next        1
prev        2
prev        1
exit        0
enter 
exit

The result I want to achieve is (for session 1):

action                   
enter                     1
next(page number = 1)     1
next(page number = 2)     1
prev(page number = 2)     1
exit                      1

next(page number = 1) appears twice in the first session but I want to count it just once. Even if it occurs 3 times in the first session but I want to count it just once. I'd like to have this rule in all the sessions for all the actions. Each action counts just once in each session.

Thank you in advance for any guidance

Upvotes: 1

Views: 116

Answers (1)

Stef
Stef

Reputation: 30589

First we create a new column 'extended_action' and then groupby this new column:

df['extended_action'] = (df.action + ' (page number = ' + df.page_number.astype(str) + ')').where(df.action.isin(('next', 'prev')), df.action)
df.groupby('extended_action').extended_action.count()

Result (for the whole example dataframe):

extended_action
enter                     3
error                     4
exit                      3
next (page number = 1)    5
next (page number = 2)    2
prev (page number = 1)    1
prev (page number = 2)    3
prev (page number = 3)    1


You can rename the columns after the groupby to get the output in the requested format:

df.groupby('extended_action',as_index=False).action.count().rename(columns={'extended_action': 'action', 'action': 'count'})

Result:

                   action  count
0                   enter      3
1                   error      4
2                    exit      3
3  next (page number = 1)      5
4  next (page number = 2)      2
5  prev (page number = 1)      1
6  prev (page number = 2)      3
7  prev (page number = 3)      1


UPDATE as per comment below:

If you need the count per session (a session lasts from enter to exit) then you need to insert a session number column and group by session and extended_action:

df['session'] = df.action.eq('enter').cumsum()
df.groupby(['session','extended_action']).extended_action.count()

Result:

session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    2
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     4
         exit                      1
         next (page number = 1)    3
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    2
         prev (page number = 3)    1
3        enter                     1
         exit                      1

UPDATE 2 as per changed question and comment below: If you want to count each event only once you just need to drop_duplicates before the grouping:

df.drop_duplicates().groupby(['session','extended_action']).extended_action.count()

Result:

session  extended_action       
1        enter                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 2)    1
2        enter                     1
         error                     1
         exit                      1
         next (page number = 1)    1
         next (page number = 2)    1
         prev (page number = 1)    1
         prev (page number = 2)    1
         prev (page number = 3)    1
3        enter                     1
         exit                      1

Upvotes: 1

Related Questions