Reputation: 4654
I have some dataframe in the form
userid | event_time | activity
A 2017-01-01 02:20:34 E1
A 2017-01-01 02:20:50 E2
A 2017-03-01 11:23:43 E1
A 2017-03-01 11:23:55 E6
B 2017-01-01 08:24:32 E1
B 2017-01-01 08:25:25 E4
C 2017-01-01 23:19:21 E3
C 2017-01-01 23:20:04 E11
I would like to apply a function to each group (grouped by userid
). That counts the number of times a user has re-experienced the same event that they had encountered. For example, user A has re-experienced E1
at 2017-03-01 11:23:43
.
userid | activity | cnt_previous_events
A E1 0
A E2 0
A E1 1
A E6 0
I have tried the following:
def previous_event_ctr(group):
events = set()
ctr = 0
for val in group:
if val in events:
ctr += 1
else:
events.add(val)
return ctr
And applied the following to my dataframe column,
df.groupby('userid').activity.agg(previous_event_ctr)
But I keep getting a TypeError: 'Series' objects are mutable, thus they cannot be hashed
. How should I be applying this the function to my dataframe using groupby
.
Upvotes: 3
Views: 459
Reputation: 863531
It seems you need cumcount
, df
has to be sorted by userid
and event_time
first:
df['count'] = df.sort_values(['userid','event_time'])
.groupby(['userid', 'activity']).activity.cumcount()
print (df)
userid event_time activity count
0 A 2017-01-01 02:20:34 E1 0
1 A 2017-01-01 02:20:50 E2 0
2 A 2017-03-01 11:23:43 E1 1
3 A 2017-03-01 11:23:55 E6 0
4 B 2017-01-01 08:24:32 E1 0
5 B 2017-01-01 08:25:25 E4 0
6 C 2017-01-01 23:19:21 E3 0
7 C 2017-01-01 23:20:04 E11 0
Upvotes: 5