How do I create a new column to identify close proximity in a time field using pandas?

Question

I have the following pandas dataframe format read into a CSV under variable "mydataframe" (that is sorted by RegisterTime):

Name, RegistrationID, RegisterTime
Jordan, 1, 2017-08-01T00:00:05
Jordan, 2, 2017-08-01T00:00:08
Jordan, 3, 2017-08-01T00:00:10
Sarah, 4, 2017-08-01T00:00:15
Jordan, 42, 2017-08-01T00:00:16 
Sarah, 54, 2017-08-01T00:00:20
Jordan, 53, 2017-08-01T00:00:30
Jordan, 55, 2017-08-01T00:00:32

The dtypes for my 'RegisterTime' is 'object', and when printed shows something like this '2017-08-01T00:00:15'.

I want to be able to 'group' similar actions performed by users in on sitting within 10 seconds of each action into a single group. I want to make it so that I am able to assign a new column for 'ActionGroup' that is assigned with an incremental value based on this.

Sample output dataframe:

Name, RegistrationID, RegisterTime, ActionGroup
Jordan, 1, 2017-08-01T00:00:05, 1
Jordan, 2, 2017-08-01T00:00:08, 1
Jordan, 3, 2017-08-01T00:00:10, 1
Sarah, 4, 2017-08-01T00:00:15, 2
Jordan, 42, 2017-08-01T00:00:16, 1 
Sarah, 54, 2017-08-01T00:00:20, 2
Jordan, 53, 2017-08-01T00:00:30, 3
Jordan, 55, 2017-08-01T00:00:32, 3

What is the best way to go about this (adding the ActionGroup column for the dataframe)?

Chris · Accepted Answer

Not the clearest way, but works:

import pandas as pd

df['RegisterTime'] = pd.to_datetime(df['RegisterTime'])
s = df.groupby('Name').apply(lambda x: x['RegisterTime'].diff().dt.seconds.fillna(0).gt(10).cumsum().astype(str) + x['Name'])
df['ActionGroup'] = df.groupby(s.droplevel(0)).ngroup() + 1
print(df)

Output:

     Name  RegistrationID        RegisterTime  ActionGroup
0  Jordan               1 2017-08-01 00:00:05            1
1  Jordan               2 2017-08-01 00:00:08            1
2  Jordan               3 2017-08-01 00:00:10            1
3   Sarah               4 2017-08-01 00:00:15            2
4  Jordan              42 2017-08-01 00:00:16            1
5   Sarah              54 2017-08-01 00:00:20            2
6  Jordan              53 2017-08-01 00:00:30            3
7  Jordan              55 2017-08-01 00:00:32            3

How do I create a new column to identify close proximity in a time field using pandas?

Answers (2)

Related Questions