Unsullied
Unsullied

Reputation: 35

Pandas calculate elapsed time

data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:01:55'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 10:01:20'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:46:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 13:27:57'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 18:03:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER01'}
]
df = pd.DataFrame(data)
df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode'])

Hi guys, I'd like to create a new elapsed time column (in hours) for licenses from the sample data of license checkout. This is just a snippet of 500k+ lines.

Conditions for new column:

I've hit a wall when I try to do a groupby agg with a custom function as I can't reference the columns by e.g. df['Action'].

Example of Desired Output :

DateTime Action LicenseCode User Elapsed_Time
2023-09-26 09:36:47 OUT APP01 USER01 0.00
2023-09-26 11:36:47 IN APP01 USER01 2.00
2023-09-27 09:30:00 OUT APP02 USER02 0.00
2023-09-27 14:30:00 IN APP02 USER02 5.00
2023-09-27 15:30:00 OUT APP02 USER02 5.00
2023-09-27 15:45:00 IN APP02 USER02 5.25
2023-09-27 16:10:00 OUT APP01 USER02 0.00
2023-09-27 16:40:00 IN APP01 USER02 0.50
2023-09-27 17:00:00 OUT APP01 USER02 0.50
2023-09-27 17:12:00 OUT APP01 USER02 0.70
2023-09-27 17:42:00 IN APP01 USER02 1.20
2023-09-27 17:52:00 IN APP01 USER02 1.37

Any thoughts will be greatly appreciated!

Upvotes: 0

Views: 65

Answers (2)

Unsullied
Unsullied

Reputation: 35

Thanks to Panda Kim, I've managed to retrieve the desired output. Here is the code modification from @Panda Kim's input.

Code:

data = [    
            {'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-26 11:36:47'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
            {'DateTime': pd.to_datetime('2023-09-27 09:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 14:30:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 15:45:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:10:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 16:40:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:00:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:12:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:42:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
            {'DateTime': pd.to_datetime('2023-09-27 17:52:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'}
]

df = pd.DataFrame(data)
cond = (df['Action'].eq('OUT')) & (df['Action'].shift(+1).eq('IN'))
grp = cond.groupby([df['LicenseCode'],df['User']]).cumsum()
df['elapsed_time'] = df.groupby([pd.Grouper(key='DateTime', freq='D'), grp, 'User', 'LicenseCode'])['DateTime'].diff()
df['elapsed_time'] = df['elapsed_time'].fillna(pd.Timedelta(seconds=0))
df['elapsed_time'] = (df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode']).elapsed_time.cumsum() \
    .dt.total_seconds()/3600)  \
    .round(2)

print(df)

Output: enter image description here

Upvotes: 1

Panda Kim
Panda Kim

Reputation: 13212

Your question is unclear. If you want the following output, use the following code. If the desired output is different from the following, please provide the desired output clearly as a text table.

Also, your example has a typo and cannot run it.

Code

cond = df['Action'].eq('OUT')
grp = cond.groupby(df['LicenseCode']).cumsum()
df['elapsed time'] = df.groupby([grp, 'LicenseCode'])['DateTime'].diff().mask(cond, 0)

df:

             DateTime Action LicenseCode    User     elapsed time
0 2023-09-26 09:36:47    OUT       APP01  USER01                0
1 2023-09-26 11:01:55     IN       APP01  USER01  0 days 01:25:08
2 2023-09-27 10:01:20    OUT       APP02  USER02                0
3 2023-09-27 14:46:09     IN       APP02  USER02  0 days 04:44:49
4 2023-09-27 13:27:57    OUT       APP02  USER01                0
5 2023-09-27 18:03:09     IN       APP02  USER01  0 days 04:35:12

Upvotes: 0

Related Questions