Reputation: 183
I am using ClusterHealthCheck procedure in health checks and I am seeing Vertx Thread blocked exception in logs. Do I need to execute health check as blocking code?
Below is the code I am using to create health checks,
Handler<Promise<Status>> procedure = ClusterHealthCheck.createProcedure(vertx);
HealthChecks checks = HealthChecks.create(vertx).register("cluster-health", procedure);
Getting below exception
23:00:18.396 [vertx-blocked-thread-checker] WARN io.vertx.core.impl.BlockedThreadChecker - tx.id= Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 5194 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
at sun.misc.Unsafe.park(Native Method) ~[?:1.8.0_292]
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) ~[?:1.8.0_292]
at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:575) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.internal.partition.impl.PartitionReplicaStateChecker.hasOnGoingMigrationMaster(PartitionReplicaStateChecker.java:309) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.internal.partition.impl.PartitionReplicaStateChecker.getPartitionServiceState(PartitionReplicaStateChecker.java:93) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.internal.partition.impl.InternalPartitionServiceImpl.isMemberStateSafe(InternalPartitionServiceImpl.java:929) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.internal.partition.operation.SafeStateCheckOperation.run(SafeStateCheckOperation.java:38) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:227) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:216) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:406) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:433) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:596) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:581) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:540) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:237) ~[app.jar:1.0.0-SNAPSHOT]
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnTarget(OperationServiceImpl.java:343) ~[app.jar:1.0.0-SNAPSHOT]
I looked into the class ClusterHealthCheck
, it looks like simple, getting partition service from cluster manager and getting the cluster status but not sure why is it taking more than 5 seconds to complete it.
I am using vertx 4.0.3, hazelcast 4.0.2 and the application is running in K8s.
Upvotes: 1
Views: 422
Reputation: 9128
The problem has been fixed in Vert.x 4.1.1.
The operation may be blocked if the node is in bad condition (e.g. low memory).
Upvotes: 1