Reputation: 377
I am working on a scheduler-like code (in PHP if that matters) and encountered an interesting thing: it's easy to reschedule a recurring task, but what if, for some reason, it was run significantly later, than it was supposed to?
For example, let's say a job needs to run every hour and it's next scheduled run is 13.05.2021 18:00
, but it runs at 13.05.2021 20:00
. Now normal rescheduling logic will be taking the original scheduled time and adding recurrence frequency (1 hour in this case), but that would make the new time 13.05.2021 19:00
, which can cause to run this job twice. We could, theoretically, use the time for "last run" but it can be something like 13.05.2021 20:03
, which would make new time 13.05.2021 21:03
.
Now my question is: what logic can we use so that in this case next time would be 13.05.2021 21:00
? I've tried googling something like this, but was not able to find anything. And I do see, that Event Scheduler in Windows, for example, does reschedule jobs in a way, that I want to do that.
Upvotes: 0
Views: 120
Reputation: 377
I actually found a pretty easy way to do what I needed, so posting it as an answer.
If we have a value of frequency
in seconds (in my case, at least) and we have the original nextrun
, which is when a task was supposed to be run initially, then the logic is as follows:
time()
, UTC_TIMESTAMP()
or whatever).nextrun
and get the difference between them in seconds.frequency
.ceil()
). If we have a value lower than 1, we may want to sanitize it.frequency
, which will give us a different result than on step 2, which is the salt of this method.nextrun
.And that's it. This does not guarantee, that you won't ever have a task run twice, if it ended just a few seconds before the time value on step 6, but to my knowledge MS Event Scheduler has the same "flaw".
Since I am doing this calculation in SQL, here's how this would look in SQL (at least for MySQL/MariaDB):
UPDATE `cron__schedule` SET `nextrun`=TIMESTAMPADD(SECOND, IF(CEIL(TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP())/`frequency`) > 0, CEIL(TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP())/`frequency`), 1)*`frequency`, `nextrun`)
To explain by referencing the logic above:
UTC_TIMESTAMP()
TIMESTAMPDIFF(SECOND, `nextrun`, UTC_TIMESTAMP())
- time comparison in seconds.TIMESTAMPDIFF(...)/`frequency`
CEIL(...)
to round up the value. IF(...)
is used to sanitize, since we can get 0 seconds, that will result in us not changing the time, at all.CEIL(...)*`frequency`
TIMESTAMPADD(...)
I do not like having to use TIMESTAMPDIFF(...)
twice because of IF(...)
, but I do not know a way to avoid that without moving to a stored procedure, which feels like an overkill. Besides, as far as I know, MySQL should calculate this value only once regardless. But, if someone can advise me on a cleaner approach, I'll update the answer.
Upvotes: 1
Reputation: 55447
There isn't a right or wrong in this situation, it really depends on your business logic and how you want to build this.
WordPress and Drupal, two of the largest CMSs out there have faced this problem, too, which boils down to "poor man's cron" versus "system cron". For a "poor man's cron", these systems rely on someone hitting the website in order to "wake" the scheduler up, and if no one visits your site in a month, your tasks don't run, either. Both of these systems instead recommend using the system's cron to be more consistent and "wake up" the scheduler at certain intervals. I would encourage you to explore this in your system, too.
The next problem is, how are you storing your recurrence? Do you have (effectively) a table with every possible run time? So for an hourly run there's 24 entries? Or is there just a single task that has an ideal run date/time? The latter is generally easier to control compared to the former which has a lot of duplicated data being stored.
Then, do tasks reschedule themselves, does the scheduler do that, or is there a middle ground where the scheduler asks the task for the next best run? Figuring this out is very important and there's some nuances.
Another thing to think about, what happens if a task runs earlier than planned? For instance, does the world break if a task runs as 01:00 and 01:15, or is it just sub-optimal.
Generally when I build these types of systems, my tasks conform to a pattern (interface in OOP) and support a "next run time". The scheduler pulls all of the tasks from a data store that have an expired "next run time" and runs them. Doing this, there's no chance for a single task to exist at both 01:00 and 02:00 because it will only exist in the data store once, for instance at 01:00. If the scheduler then wakes up at 01:15, it finds the 01:00 task which has expired and runs it, and then it asks the task for the next run. The task looks at the clock (or time as provided by the scheduler if you are running in a distributed environment) and the task performs its own logic to determine that. If the logic is every hour, you can add 60 minutes from "now" and then remove the minutes portions, so 01:15 becomes 02:00.
Throw some exception handling and possibly database transactions into this mix to guarantee that a task can't fail but still get rescheduled, too.
Upvotes: 0