Design pattern for modeling job execution flow

Question

In my application I have a set of jobs to execute. Each jobs goes through the states "not started", "started", "completed", "failed" etc. Each job has a set of preconditions and post-conditions. A job cannot start until the preconditions are satisfied and should be marked as failed if it doesn't satisfy the post conditions.

For example, let's say the job imports a text file into the database. The precondition would be to check if the source file exists and post condition would be to check if data exists in database.

On top of these pre and post conditions, sometimes a job is also dependent on other jobs to finish. It is easy to create a jobs table and have a dependency table for jobs, but is it actually possible to make these pre and post validation checks to be configurable in the database (so that no code changes need to be made if these conditions change or new conditions are added)? Even if it is possible somehow, is it a good idea to do so?

There is a requirement to make this model generic so that other applications can also make use of it even if the validation checks to be performed are entirely different for other applications.

Joel Brown · Accepted Answer

I think you run the risk of trying to table drive too much. By attempting to table drive all of the pre and post validation conditions you are getting dangerously close to trying to write code in data.

I have built some pretty sophisticated job scheduling applications. One in particular that might be of interest was a daily ETL process that loaded dozens of SQL tables based on flat file feeds.

The existing system used a linear process where the programmer had to manually work out the inter-table dependencies and run the table loads in a given order. The problem with this was that if any process failed, the rest of the jobs had to sit and wait until the problem was resolved.

I built a new system that had table driven metadata that pointed out the immediate inter-table dependencies. In other words, table A has FKs to tables B and C. Instead of manually keeping track of all of the interdependencies manually, only the immediate interdependencies were tracked. Then the scheduler just had to look at which loads had completed and which loads hadn't. Any pending load which had no incomplete dependencies was OK to start.

I think you should build your system similarly. Use separation of concerns. Don't table drive what the dependencies are, instead you should just table drive which dependencies exist. You can track in your scheduling tables which of these dependencies have passed and which have failed. The database doesn't need to know how to do these tests. Let the code worry about what the dependencies are exactly and how to test whether they pass or fail. That is all your job scheduler needs to know. Avoid the temptation to create a scripting language whose source code sits in your database.

Design pattern for modeling job execution flow

Answers (2)

Related Questions