vel
vel

Reputation: 183

Vertx in-built ClusterHealthCheck procedure blocking vertx thread

I am using ClusterHealthCheck procedure in health checks and I am seeing Vertx Thread blocked exception in logs. Do I need to execute health check as blocking code?

Below is the code I am using to create health checks,

Handler<Promise<Status>> procedure = ClusterHealthCheck.createProcedure(vertx);
HealthChecks checks = HealthChecks.create(vertx).register("cluster-health", procedure);

Getting below exception

23:00:18.396 [vertx-blocked-thread-checker] WARN  io.vertx.core.impl.BlockedThreadChecker - tx.id= Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 5194 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
    at sun.misc.Unsafe.park(Native Method) ~[?:1.8.0_292]
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) ~[?:1.8.0_292]
    at com.hazelcast.spi.impl.AbstractInvocationFuture.joinInternal(AbstractInvocationFuture.java:575) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.internal.partition.impl.PartitionReplicaStateChecker.hasOnGoingMigrationMaster(PartitionReplicaStateChecker.java:309) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.internal.partition.impl.PartitionReplicaStateChecker.getPartitionServiceState(PartitionReplicaStateChecker.java:93) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.internal.partition.impl.InternalPartitionServiceImpl.isMemberStateSafe(InternalPartitionServiceImpl.java:929) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.internal.partition.operation.SafeStateCheckOperation.run(SafeStateCheckOperation.java:38) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:184) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:227) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:216) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:406) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:433) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:596) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:581) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:540) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:237) ~[app.jar:1.0.0-SNAPSHOT]
    at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnTarget(OperationServiceImpl.java:343) ~[app.jar:1.0.0-SNAPSHOT]

I looked into the class ClusterHealthCheck, it looks like simple, getting partition service from cluster manager and getting the cluster status but not sure why is it taking more than 5 seconds to complete it.

I am using vertx 4.0.3, hazelcast 4.0.2 and the application is running in K8s.

Upvotes: 1

Views: 422

Answers (1)

tsegismont
tsegismont

Reputation: 9128

The problem has been fixed in Vert.x 4.1.1.

The operation may be blocked if the node is in bad condition (e.g. low memory).

Upvotes: 1

Related Questions