Reputation: 191
I have the following example data, and I'd like to count action(page_number=2) and action(page_number=3) one time for each try if it is available. I have attached the result I mean. I want to do using pandas dataframe but I don't know how it is.
import pandas as pd
df = pd.DataFrame({'action': ['enter', 'next', 'prev', 'error', 'exit'],
'page_number': [0, 1, 2, 3]})
For example, we have this data
action page_number
enter
next 1
prev 2
next 1
next 2
exit 3
enter
next 1
error
next 1
error
error
error
next 2
prev 3
prev 2
next 1
prev 2
prev 1
exit 0
enter
exit
The result I want to achieve is (for session 1):
action
enter 1
next(page number = 1) 1
next(page number = 2) 1
prev(page number = 2) 1
exit 1
next(page number = 1) appears twice in the first session but I want to count it just once. Even if it occurs 3 times in the first session but I want to count it just once. I'd like to have this rule in all the sessions for all the actions. Each action counts just once in each session.
Thank you in advance for any guidance
Upvotes: 1
Views: 116
Reputation: 30589
First we create a new column 'extended_action' and then groupby this new column:
df['extended_action'] = (df.action + ' (page number = ' + df.page_number.astype(str) + ')').where(df.action.isin(('next', 'prev')), df.action)
df.groupby('extended_action').extended_action.count()
Result (for the whole example dataframe):
extended_action
enter 3
error 4
exit 3
next (page number = 1) 5
next (page number = 2) 2
prev (page number = 1) 1
prev (page number = 2) 3
prev (page number = 3) 1
df.groupby('extended_action',as_index=False).action.count().rename(columns={'extended_action': 'action', 'action': 'count'})
Result:
action count
0 enter 3
1 error 4
2 exit 3
3 next (page number = 1) 5
4 next (page number = 2) 2
5 prev (page number = 1) 1
6 prev (page number = 2) 3
7 prev (page number = 3) 1
If you need the count per session (a session lasts from enter
to exit
) then you need to insert a session number column and group by session
and extended_action
:
df['session'] = df.action.eq('enter').cumsum()
df.groupby(['session','extended_action']).extended_action.count()
Result:
session extended_action
1 enter 1
exit 1
next (page number = 1) 2
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 4
exit 1
next (page number = 1) 3
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 2
prev (page number = 3) 1
3 enter 1
exit 1
UPDATE 2 as per changed question and comment below:
If you want to count each event only once you just need to drop_duplicates
before the grouping:
df.drop_duplicates().groupby(['session','extended_action']).extended_action.count()
Result:
session extended_action
1 enter 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 2) 1
2 enter 1
error 1
exit 1
next (page number = 1) 1
next (page number = 2) 1
prev (page number = 1) 1
prev (page number = 2) 1
prev (page number = 3) 1
3 enter 1
exit 1
Upvotes: 1