Reputation: 31
I need help in grouping my data by date to find Sum of Days worked
and Total Hours worked
for each ID
. My data is in a txt file. An example of my data is provided below :
ID,Date,Start Time,End Time
1,02/11/2019,08:25,16:24
1,03/11/2019,07:54,16:20
1,04/11/2019,07:46,16:08
1,05/11/2019,08:13,16:10
1,06/11/2019,07:48,15:24
1,08/11/2019,08:27,15:26
1,09/11/2019,08:25,16:02
1,10/11/2019,07:49,15:03
1,11/11/2019,08:18,15:29
1,12/11/2019,07:54,16:21
1,15/11/2019,07:50,16:24
1,16/11/2019,07:48,16:25
1,17/11/2019,08:25,15:28
1,19/11/2019,07:49,16:10
1,21/11/2019,08:11,15:06
1,22/11/2019,07:58,15:29
1,23/11/2019,07:49,15:20
1,24/11/2019,08:28,16:28
1,26/11/2019,08:29,16:14
1,27/11/2019,07:59,15:18
1,28/11/2019,07:53,16:24
1,29/11/2019,07:49,16:25
1,30/11/2019,08:25,15:08
1,31/11/2019,08:10,16:09
2,01/11/2019,07:58,15:22
2,02/11/2019,07:53,16:16
2,03/11/2019,08:13,15:05
2,04/11/2019,07:54,16:04
An example of the final output I require is provided below:
ID,Hours Worked,Days Worked
1,168,24
I have been trying for a week using lists and dictionaries but have not been able to come up with a solution. Any help using datetime methods is much appreciated!
Upvotes: 0
Views: 2327
Reputation: 123
If you can use the pandas library, here's something you can do
import pandas as pd
df=pd.read_csv('filename.csv')
df['Duration']=(pd.to_datetime(df['End Time'], format='%H:%M')-pd.to_datetime(df['Start Time'], format='%H:%M')).dt.components['hours']
df.groupby('ID').Duration.agg('sum')
result=df.groupby('ID').agg({'Duration': ['sum'], 'Date': ['count']})
Here, we read the file into a pandas dataframe, create a column with timedelta values for the start and end times, then group by the 'ID' and aggregate the sum over column 'Duration' and the count over column 'Date' as this gives the number of days worked.
The output should look something like this -
Duration Date
sum count
ID
1 174 24
2 29 4
Upvotes: 2
Reputation: 254
if you are sure that timestamp in your input for each id does not duplicate or overlap
this solution store id: hout works and should be easy to transfer hour -> day
from datetime import datetime
testlist = [
[1,"02/11/2019","08:25","16:25"],
[1,"03/11/2019","07:25","16:25"],
[2,"02/13/2019","08:25","17:25"],
[2,"03/13/2019","06:25","16:25"]
]
fmt = '%H:%M'
dictx = {}
for time in testlist:
if time[0] not in dictx:
dictx[time[0]] = 0
tstamp1 = datetime.strptime(time[2], fmt)
tstamp2 = datetime.strptime(time[3], fmt)
dictx[time[0]] += int(round((tstamp2 - tstamp1).total_seconds()) / 3600)
print(dictx)
Upvotes: 0