Reputation: 1
I have a PostgreSQL table that contains job history:
CREATE TABLE jobHistory
(
jobid int4,
starttime timestamptz,
endtime timestamptz
);
I'm trying to create an SQL query to show, for each hour, the average number of running jobs. There should be 24 rows (0-23 for 24 hours), to contain the average number of jobs running that hour.
The total number of rows in my jobHistory
table is around 20k. Here's a sample:
jobid startTime endTime
------------------------------------------------------
377 2017-11-02 15:43:43.0 2017-11-02 15:49:22.0
114 2017-11-02 15:47:05.0 2017-11-02 15:56:19.0
378 2017-11-02 15:49:22.0 2017-11-02 15:53:01.0
379 2017-11-02 15:53:01.0 2017-11-02 16:33:16.0
380 2017-11-02 16:33:16.0 2017-11-02 16:51:26.0
381 2017-11-02 16:51:26.0 2017-11-02 17:03:00.0
382 2017-11-02 17:03:00.0 2017-11-02 17:11:59.0
383 2017-11-02 17:11:59.0 2017-11-02 17:12:44.0
384 2017-11-02 17:12:43.0 2017-11-02 17:13:36.0
385 2017-11-02 17:13:36.0 2017-11-02 17:14:20.0
386 2017-11-02 17:14:20.0 2017-11-02 17:15:08.0
387 2017-11-02 17:15:08.0 2017-11-02 17:16:03.0
388 2017-11-02 17:16:03.0 2017-11-02 18:31:01.0
150 2017-11-02 17:55:00.0 2017-11-02 18:16:32.0
153 2017-11-02 17:55:00.0 2017-11-02 18:15:28.0
135 2017-11-02 17:55:00.0 2017-11-02 17:55:01.0
132 2017-11-02 17:55:00.0 2017-11-02 18:13:00.0
156 2017-11-02 17:55:00.0 2017-11-02 17:55:06.0
147 2017-11-02 17:55:00.0 2017-11-02 17:57:47.0
138 2017-11-02 17:55:00.0 2017-11-02 18:12:33.0
140 2017-11-02 17:55:00.0 2017-11-02 17:55:17.0
389 2017-11-02 18:31:01.0 2017-11-02 18:38:11.0
390 2017-11-02 18:38:11.0 2017-11-02 18:38:35.0
391 2017-11-02 18:38:35.0 2017-11-02 18:38:39.0
I'm looking for average number of jobs per hour. Jobs are running 24 hours a day, all day. Some hours have more jobs running than others. There's about 120 jobs total. Some take minutes, others several hours. To keep the load balanced, when I schedule a new job, I'd like to see which hour has least number of jobs on average.
Upvotes: 0
Views: 287
Reputation: 1
Thanks for everyone who took a look at this. I'm convinced there is a way to get the answer using pure SQL, but I just can't figure it out. To answer my question, I coded a perl script to get the results.
Here's how it works.
1.) Get the number of days between the earliest job record and the latest job record, this will be used to AVERAGE the number of jobs for each occurrence of each hour 0-23.
2.) For each job record, increment a counter for each hour of the day that the job was running. For example, if the job ran from 2pm - 6pm, the script will increment a counter for the 2,3,4,5 and 6 hours. If the job was longer than 23 hours, I increment the values of 0-23 just once. At this point, I have a result that has 2 columns. The first column is the hour (0-23) and the second column is the number occurrences of jobs that ran during that hour.
3.) My script then takes the number from step 2 above and divides it by the number of days to produce an average.
Because I've written a perl script to produce the result, I now know that the best time slot to schedule new jobs is at 11AM.
If anyone is interested in tackling the SQL, I'd be interested in seeing how it's done. For now, I'll just keep my perl script handy.
Here's a link to the full Job History Data:
https://www.dropbox.com/s/bi66s0thh4sfp46/JobHistoryData.csv?dl=0
Here's the output of my perl script that calculates the average number of running jobs for each hour:
0 = 2.49
1 = 2.10
2 = 2.16
3 = 2.02
4 = 4.10
5 = 2.86
6 = 2.00
7 = 1.69
8 = 1.69
9 = 1.80
10 = 1.67
11 = 1.53
12 = 1.76
13 = 1.73
14 = 8.33
15 = 2.24
16 = 1.63
17 = 8.67
18 = 6.12
19 = 5.00
20 = 4.84
21 = 3.76
22 = 10.43
23 = 4.37
Upvotes: 0
Reputation: 2465
Your question is not clear.
I am assuming that you are looking for total number of running jobs per hour
on the basis of column starttime
, so below is the query.
SELECT count(jobid) AS TotalJobs,
date_trunc('hour', starttime) AS Date_Hour
FROM jobHistory
GROUP BY date_trunc('hour', starttime)
order by Date_Hour;
If you need it on the basis of endTime
, you can change the above query replacing starttime
with endTime
Result:
totaljobs date_hour
-------------------------------
4 02.11.2017 15:00:00
2 02.11.2017 16:00:00
15 02.11.2017 17:00:00
3 02.11.2017 18:00:00
Hope this will help
Upvotes: 2