maverick
maverick

Reputation: 85

Re-distribute job runs across dates

I have a table run_schedule that maintains the future dates associated with specific job runs like the one shown below

id  job_type    run_dt
1   A           2020-05-25
2   A           2020-05-25
3   A           2020-05-25
4   A           2020-05-25
5   A           2020-05-25
6   A           2020-05-24
7   A           2020-05-24
8   A           2020-05-23
9   B           2020-05-25
10  B           2020-05-25
11  B           2020-05-25
12  B           2020-05-25
13  B           2020-05-25
14  B           2020-05-25
15  B           2020-05-24
16  B           2020-05-24

In my case the number of jobs by type per day varies drastically and some days have a skew.

select job_type, run_dt, count(*)
  from run_schedule
 group by job_type, run_dt
 order by job_type;

job_type    run_dt      count
A           2020-05-23  1
A           2020-05-25  5
A           2020-05-24  2
B           2020-05-25  6
B           2020-05-24  2

Is there any way in postgres that I can re-distribute the job run dates (run_dt column) so that it's more uniformly distributed? Ideally I would like to see a distribution like the one below,

job_type    run_dt      count
A           2020-05-23  3
A           2020-05-25  3
A           2020-05-24  2
B           2020-05-25  3
B           2020-05-24  4

So effectively, I've re-scheduled 2 jobs of type A which were originally scheduled to run on the 25th to the 23rd and 2 jobs of type B to run on the 24th bringing them closer to the average of each job type across those days.

Upvotes: 1

Views: 40

Answers (1)

GMB
GMB

Reputation: 222582

You could use window functions to get the minimum and maximum date of each job type, along with the total number of records and the rank of each record within the group. Using this information, you can equally spread jobs across dates:

select
    id,
    job_type,
    date(min_run_dt + (max_run_dt - min_run_dt + 1) * (rn - 1) / cnt * interval '1 day') run_dt
from (
    select 
        t.*, 
        min(run_dt) over(partition by job_type) min_run_dt , 
        max(run_dt) over(partition by job_type) max_run_dt, 
        count(*) over(partition by job_type) cnt,
        row_number() over(partition by job_type order by run_dt) rn
    from mytable t
) t
order by job_type, run_dt

For your sample data, this produces:

id | job_type | run_dt    
-: | :------- | :---------
 8 | A        | 2020-05-23
 6 | A        | 2020-05-23
 7 | A        | 2020-05-23
 4 | A        | 2020-05-24
 1 | A        | 2020-05-24
 5 | A        | 2020-05-24
 2 | A        | 2020-05-25
 3 | A        | 2020-05-25
16 | B        | 2020-05-24
15 | B        | 2020-05-24
11 | B        | 2020-05-24
12 | B        | 2020-05-24
13 | B        | 2020-05-25
14 | B        | 2020-05-25
 9 | B        | 2020-05-25
10 | B        | 2020-05-25

Upvotes: 1

Related Questions