Reputation: 35
data = [
{'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
{'DateTime': pd.to_datetime('2023-09-26 11:01:55'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
{'DateTime': pd.to_datetime('2023-09-27 10:01:20'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 14:46:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 13:27:57'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER01'},
{'DateTime': pd.to_datetime('2023-09-27 18:03:09'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER01'}
]
df = pd.DataFrame(data)
df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode'])
Hi guys, I'd like to create a new elapsed time column (in hours) for licenses from the sample data of license checkout. This is just a snippet of 500k+ lines.
Conditions for new column:
elapsed time will be grouped by day, user & license code as only elapsed time for each day will be tabulated for each unique user & licensecode
if df['Action'] == OUT, elapsed time = 0
if (df['Action'] == OUT) & (df['Action'].shift() == OUT), elapsed time = df[DateTime].diff()
if 'Action' == IN, elapsed time = calculate cumulative time diff of previous 'OUT' line (they work in pairs OUT then IN)
there are instances of OUT,OUT,IN,IN (order of 'Action' rows), most ideal will be OUT,IN,OUT,IN
I've hit a wall when I try to do a groupby agg with a custom function as I can't reference the columns by e.g. df['Action'].
Example of Desired Output :
DateTime | Action | LicenseCode | User | Elapsed_Time |
---|---|---|---|---|
2023-09-26 09:36:47 | OUT | APP01 | USER01 | 0.00 |
2023-09-26 11:36:47 | IN | APP01 | USER01 | 2.00 |
2023-09-27 09:30:00 | OUT | APP02 | USER02 | 0.00 |
2023-09-27 14:30:00 | IN | APP02 | USER02 | 5.00 |
2023-09-27 15:30:00 | OUT | APP02 | USER02 | 5.00 |
2023-09-27 15:45:00 | IN | APP02 | USER02 | 5.25 |
2023-09-27 16:10:00 | OUT | APP01 | USER02 | 0.00 |
2023-09-27 16:40:00 | IN | APP01 | USER02 | 0.50 |
2023-09-27 17:00:00 | OUT | APP01 | USER02 | 0.50 |
2023-09-27 17:12:00 | OUT | APP01 | USER02 | 0.70 |
2023-09-27 17:42:00 | IN | APP01 | USER02 | 1.20 |
2023-09-27 17:52:00 | IN | APP01 | USER02 | 1.37 |
Any thoughts will be greatly appreciated!
Upvotes: 0
Views: 65
Reputation: 35
Thanks to Panda Kim, I've managed to retrieve the desired output. Here is the code modification from @Panda Kim's input.
Code:
data = [
{'DateTime': pd.to_datetime('2023-09-26 09:36:47'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER01'},
{'DateTime': pd.to_datetime('2023-09-26 11:36:47'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER01'},
{'DateTime': pd.to_datetime('2023-09-27 09:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 14:30:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 15:30:00'), 'Action': 'OUT', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 15:45:00'), 'Action': 'IN', 'LicenseCode': 'APP02', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 16:10:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 16:40:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 17:00:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 17:12:00'), 'Action': 'OUT', 'LicenseCode': 'APP01', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 17:42:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'},
{'DateTime': pd.to_datetime('2023-09-27 17:52:00'), 'Action': 'IN', 'LicenseCode': 'APP01', 'User': 'USER02'}
]
df = pd.DataFrame(data)
cond = (df['Action'].eq('OUT')) & (df['Action'].shift(+1).eq('IN'))
grp = cond.groupby([df['LicenseCode'],df['User']]).cumsum()
df['elapsed_time'] = df.groupby([pd.Grouper(key='DateTime', freq='D'), grp, 'User', 'LicenseCode'])['DateTime'].diff()
df['elapsed_time'] = df['elapsed_time'].fillna(pd.Timedelta(seconds=0))
df['elapsed_time'] = (df.groupby([pd.Grouper(key='DateTime', freq='D'),'User', 'LicenseCode']).elapsed_time.cumsum() \
.dt.total_seconds()/3600) \
.round(2)
print(df)
Upvotes: 1
Reputation: 13212
Your question is unclear. If you want the following output, use the following code. If the desired output is different from the following, please provide the desired output clearly as a text table.
Also, your example has a typo and cannot run it.
Code
cond = df['Action'].eq('OUT')
grp = cond.groupby(df['LicenseCode']).cumsum()
df['elapsed time'] = df.groupby([grp, 'LicenseCode'])['DateTime'].diff().mask(cond, 0)
df:
DateTime Action LicenseCode User elapsed time
0 2023-09-26 09:36:47 OUT APP01 USER01 0
1 2023-09-26 11:01:55 IN APP01 USER01 0 days 01:25:08
2 2023-09-27 10:01:20 OUT APP02 USER02 0
3 2023-09-27 14:46:09 IN APP02 USER02 0 days 04:44:49
4 2023-09-27 13:27:57 OUT APP02 USER01 0
5 2023-09-27 18:03:09 IN APP02 USER01 0 days 04:35:12
Upvotes: 0