Reputation: 1255
I have a data frame that includes different columns. One of the columns is the name of the users (note that this not unique meaning that a name might appear at different rows) and one of the other columns is the date_time showing the time and date of logins by different users. Now, I want to build a new column showing the cumulative count of logins for each user on each day. Specifically, I want to add a new feature that tells me how many times the user logged before the current login on the same day. We should go through each row and based on the user name find the number of logins by the same user for the same day but before the current login and count them.
Could you please help me with how I can write this in the most efficient way?
The input dataframe is:
user date_time
JK 2016-01-15 09:27:00
KL 2016-02-10 12:17:00
JK 2016-01-15 11:07:00
JK 2016-01-15 12:17:10
KL 2016-02-10 12:47:00
JK 2017-11-15 08:27:00
The output dataframe is:
user date_time cum_count
Jk 2016-01-15 09:27:00 0
KL 2016-02-10 12:17:00 0
Jk 2016-01-15 11:07:00 1
JK 2016-01-15 12:17:10 2
KL 2016-02-10 12:47:00 1
Jk 2017-11-15 08:27:00 0
Upvotes: 2
Views: 2944
Reputation: 30971
Pandas grouping contains cumcount function, so the obvious solution is just to use it.
And a note about grouping:
So the code can be:
df['cum_count'] = df.groupby([df.user.str.upper(), df.date_time.dt.normalize()])\
.cumcount()
Upvotes: 1
Reputation: 30920
Use GroupBy.cumcount
df['cum_count'] = df.groupby(['user', df['date_time'].dt.date]).cumcount()
#if neccesary convert to datetime
#df['cum_count'] = df.groupby(['user',
# pd.to_datetime(df['date_time']).dt.date]).cumcount()
user date_time cum_count
0 Jk 2016-01-15 09:27:00 0
1 KL 2016-02-10 12:17:00 0
2 Jk 2016-01-15 11:07:00 1
3 JK 2016-01-15 12:17:10 0
4 KL 2016-02-10 12:47:00 1
5 Jk 2017-11-15 08:27:00 0
You need Series.str.lower
if you don't want to distinguish between upper and lower case in the name of the users
df['cum_count'] = (df.groupby([df['user'].str.lower(),
df['date_time'].dt.date]).cumcount()
user date_time cum_count
0 Jk 2016-01-15 09:27:00 0
1 KL 2016-02-10 12:17:00 0
2 Jk 2016-01-15 11:07:00 1
3 JK 2016-01-15 12:17:10 2
4 KL 2016-02-10 12:47:00 1
5 Jk 2017-11-15 08:27:00 0
Upvotes: 1
Reputation: 323226
Let us do groupby
+ cumcount
df.groupby([df.user,df.datetime.dt.date]).cumcount()
Upvotes: 4