Pandas calculate elapsed time

Question

data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:01:55'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 10:01:20'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:46:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 13:27:57'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 18:03:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER01'}
]
df = pd.DataFrame(data)
df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode'])

Hi guys, I'd like to create a new elapsed time column (in hours) for licenses from the sample data of license checkout. This is just a snippet of 500k+ lines.

Conditions for new column:

elapsed time will be grouped by day, user & license code as only elapsed time for each day will be tabulated for each unique user & licensecode
if df['Action'] == OUT, elapsed time = 0
if (df['Action'] == OUT) & (df['Action'].shift() == OUT), elapsed time = df[DateTime].diff()
if 'Action' == IN, elapsed time = calculate cumulative time diff of previous 'OUT' line (they work in pairs OUT then IN)
there are instances of OUT,OUT,IN,IN (order of 'Action' rows), most ideal will be OUT,IN,OUT,IN

I've hit a wall when I try to do a groupby agg with a custom function as I can't reference the columns by e.g. df['Action'].

Example of Desired Output :

DateTime	Action	LicenseCode	User	Elapsed_Time
2023-09-26 09:36:47	OUT	APP01	USER01	0.00
2023-09-26 11:36:47	IN	APP01	USER01	2.00
2023-09-27 09:30:00	OUT	APP02	USER02	0.00
2023-09-27 14:30:00	IN	APP02	USER02	5.00
2023-09-27 15:30:00	OUT	APP02	USER02	5.00
2023-09-27 15:45:00	IN	APP02	USER02	5.25
2023-09-27 16:10:00	OUT	APP01	USER02	0.00
2023-09-27 16:40:00	IN	APP01	USER02	0.50
2023-09-27 17:00:00	OUT	APP01	USER02	0.50
2023-09-27 17:12:00	OUT	APP01	USER02	0.70
2023-09-27 17:42:00	IN	APP01	USER02	1.20
2023-09-27 17:52:00	IN	APP01	USER02	1.37

Any thoughts will be greatly appreciated!

Unsullied · Accepted Answer

Thanks to Panda Kim, I've managed to retrieve the desired output. Here is the code modification from @Panda Kim's input.

Code:

data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:36:47'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 09:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:30:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:45:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:10:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:40:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:00:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:12:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:42:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:52:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'}
]

df = pd.DataFrame(data)
cond = (df['Action'].eq('OUT')) & (df['Action'].shift(+1).eq('IN'))
grp = cond.groupby([df['LicenseCode'],df['User']]).cumsum()
df['elapsed_time'] = df.groupby([pd.Grouper(key='DateTime', freq='D'), grp, 'User', 'LicenseCode'])['DateTime'].diff()
df['elapsed_time'] = df['elapsed_time'].fillna(pd.Timedelta(seconds=0))
df['elapsed_time'] = (df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode']).elapsed_time.cumsum() \
    .dt.total_seconds()/3600)  \
    .round(2)

print(df)

Output:

Pandas calculate elapsed time

Answers (2)

Related Questions