Karim
Karim

Reputation: 35

Quartz Clustered Mode with Spring Boot & Docker: Jobs Not Failing Over to Other Node

I have a Spring Boot application running two instances in Docker, using Quartz in clustered mode with a shared MySQL database. However, when I shut down one instance, the jobs and traffic do not switch to the other node as expected.

My Quartz Configuration (application.yml):

quartz:
  job-store-type: jdbc
  scheduler-name: scheduler
  jdbc:
    initialize-schema: never  
  properties:
    org.quartz.jobStore.class: org.quartz.impl.jdbcjobstore.JobStoreTX
    org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate
    org.quartz.jobStore.dataSource: dataSource
    org.quartz.jobStore.tablePrefix: QRTZ_
    org.quartz.jobStore.isClustered: true
    org.quartz.jobStore.useProperties: true
    org.quartz.jobStore.clusterCheckinInterval: 2000
    org.quartz.scheduler.instanceId: AUTO
    org.quartz.scheduler.instanceName: MyScheduler
    org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
    org.quartz.threadPool.threadCount: 2
    org.quartz.threadPool.threadPriority: 5
    org.quartz.scheduler.wrapJobExecutionInUserTransaction: false
    org.quartz.dataSource.dataSource.driver: com.mysql.cj.jdbc.Driver
    org.quartz.dataSource.dataSource.URL: jdbc:mysql://localhost:3306/replica_db
    org.quartz.dataSource.dataSource.user: root
    org.quartz.dataSource.dataSource.password: admin
    org.quartz.jobStore.misfireThreshold: 20000  

Problem:

I expect when one instance shuts down, the running jobs should move to the other node. However, when I stop an instance, jobs stop running altogether instead of failing over. The database (QRTZ_ tables) is correctly set up, and the logs don’t show errors related to Quartz clustering. What I Have Tried: Checked database locks: Jobs seem to be stuck in the database but are not picked up by the other node. Misfire handling: Set misfireThreshold: 20000, but jobs are not retried on failover. Verified Cluster Check-in Interval: Set to 2000ms, so the cluster should detect node failure quickly. Confirmed Scheduler Names: Both instances share the same instanceName.

Questions:

Why isn’t Quartz failing over jobs to the other instance? How can I ensure jobs resume execution on the remaining active node when one instance goes down? Are there additional Quartz settings needed for proper failover in a Dockerized Spring Boot setup?

Upvotes: 0

Views: 17

Answers (0)

Related Questions