Reputation: 278
Our Spring Boot application relies on information stored in Kafka to answer REST requests. It does so by retrieving information from a global state store via the InteractiveQueryService
.
Unfortunately, during a graceful shutdown, ongoing requests lead to the following error:
java.lang.IllegalStateException: Error retrieving state store: my-global-store-name-v0
at org.springframework.cloud.stream.binder.kafka.streams.InteractiveQueryService.lambda$getQueryableStore$1(InteractiveQueryService.java:153)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:344)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:217)
at org.springframework.cloud.stream.binder.kafka.streams.InteractiveQueryService.getQueryableStore(InteractiveQueryService.java:103)
...
Our analysis showed that this error is caused by the fact that WebServerGracefulShutdownLifecycle
, which handles the graceful shutdown, has a phase with value Integer.MAX_VALUE - 1024
while StreamsBuilderFactoryManager
has a phase with value Integer.MAX_VALUE - 100
. This means that StreamsBuilderFactoryManager
is shut down before the graceful shutdown is initiated, so that ongoing requests processed as part of the graceful shutdown no longer have access to the global state store.
This is indeed visible in the logs:
2025-01-08 17:04:16.225 DEBUG org.springframework.context.support.DefaultLifecycleProcessor [Stopping beans in phase 2147483547]
...
2025-01-08 17:04:16.572 DEBUG org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl [Closing global storage engine my-global-store-name-v0]
...
2025-01-08 17:04:16.577 DEBUG org.springframework.context.support.DefaultLifecycleProcessor [Bean 'streamsBuilderFactoryManager' completed its stop procedure]
...
2025-01-08 17:04:16.584 DEBUG org.springframework.context.support.DefaultLifecycleProcessor [Stopping beans in phase 2147482623]
2025-01-08 17:04:16.584 INFO org.springframework.boot.web.embedded.tomcat.GracefulShutdown [Commencing graceful shutdown. Waiting for active requests to complete]
2025-01-08 17:04:16.587 INFO org.springframework.boot.web.embedded.tomcat.GracefulShutdown [Graceful shutdown complete]
The Javadoc of StreamsBuilderFactoryManager
highlights the fact that choosing a phase close to Integer.MAX_VALUE
was a conscious decision:
* This {@link SmartLifecycle} class ensures that the bean created from it is started very
* late through the bootstrap process by setting the phase value closer to
* Integer.MAX_VALUE. This is to guarantee that the {@link StreamsBuilderFactoryBean} on a
* function with multiple bindings is only started after all the binding phases have completed successfully.
Also, the constant AbstractMessageListenerContainer.DEFAULT_PHASE
, which is not used by the StreamsBuilderFactoryManager
but which has the same value of Integer.MAX_VALUE - 100
, makes the following claim (which unfortunately I was unable to verify):
// The default org.springframework.context.SmartLifecycle phase for listener containers 2147483547.
Based on the analysis above, is it a bug that the value chosen for the phase of StreamsBuilderFactoryManager
is above the one of WebServerGracefulShutdownLifecycle
(in which case I am happy to create an issue on the github project and provide a fix)?
If not, how can we ensure that ongoing requests still have access to the global state store during the graceful shutdown so that they can be answered properly?
Upvotes: 1
Views: 51