JobRepository for several Spring Batch jobs

Question

Context

We are trying to establish standards about how to use Spring Batch in a large IT service, with distinct business interests.

We will likely have several batches belonging to different business domains. We already know that some of them will have to fetch some parameters from a table common to all batches (i.e. Java and COBOL; e.g. date parameters).

Volumetry

The number of Spring Batch jobs we will implement is hard to evaluate. There is no goal to rewrite existing COBOL batches, and continuous-flow processing is encouraged whenever possible.

Some questions for proofs-of-concepts arise from time to time, but with few conclusive work for the time being.

We also already have a few idempotent batches but those use map based JobRepositories. However, a world where all batches are idempotent is a fantasy.

Question

One of the questions we have and I cannot find documentation or recommendations about is the following.

In this context, what is the best approach for the JobRepositories? Is it better to store all of them into one central DB or should each JAR or business unit have its own DB?

Addendum

Personal thoughts

I think it would seem only logical to have the batch metadata in the same place as the parameters. I do not think we can add tables to the schema where the parameters table is, but we can probably have a view of it in the schema where we create the Spring Batch metadata model.

The real question is to know whether it is better to have one metadata storage for all or to make individual ones.

Wish for official recommendations

I did not find any recommendation for this in Spring Batch's documentation but I will be immensely grateful if you can include a link to official recommendations. If it is not possible, any enlightened advice is good to take.

Michael Pralow · Accepted Answer

there is no "best" approach, you need it tailored to your requirements

pro central jobrepository

centralism in your it department
one scheduler (-system to operate the jobs)
- weak argument, more of an indicator for a "central" culture
one infrastructure to run the jobs (e.g. application server cluster)
no problem with central database downtimes (e.g. maintenance)

contra central jobrepository

no centralism in your it department
take all above and reverse it :-)
business data is not on same database as job repo (e.g. DB2 and Oracle)
- downtime problem ahead...