Reputation: 93
We use a distributed Hazelcast map in our project, and have recently run into a problem with extremely high get latencies. We used IMap.get(...)
calls, which in some cases took hours to complete. After this incident, we switched to IMap.getAsync(...)
API with timeouts, which resolves the problem for us, but I am curious if anybody had a similar issue.
Our Hazelcast version is 3.9.0. During the incident we had hazelcast.operation.call.timeout.millis
set to 5000, and async-backup-count="3"
with read-backup-data="true"
. We also had sporadic CPU usage spikes (up to 100% for several minutes) on some hosts due to unrelated background processing, which could potentially affect Hazelcast.
The only suspicious thing that we found in the logs was that around the incident all hosts were complaing about one particular host, like this:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739863 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739864 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739852 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739870 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
WARNING: [hostX]:5702 [dev] [3.9] No Invocation found for call timeout response with callId739874 sent from [hostY]:5702
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InboundResponseHandler
In the hostY
's logs:
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] MonitorInvocationsTask delayed 14294 ms
Oct 24, 2018 3:53:01 PM com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor
WARNING: [hostY]:5702 [dev] [3.9] BroadcastOperationControlTask delayed 13544 ms
Any ideas?
Upvotes: 3
Views: 3982
Reputation: 1949
From the hostY
's logs, it seems hostY
suffering from GC pauses. MonitorInvocationsTask
is scheduled to run every seconds but it says its execution is delayed for 14 seconds. And BroadcastOperationControlTask
should be scheduled nearly every second due to your config (hazelcast.operation.call.timeout.millis / 4 = 1250 ms
) but similarly it's delayed for 13 seconds.
You can verify this by enabling GC logs. Additionally Hazelcast should print HealthMonitor
logs periodically when memory and/or CPU usage exceeds some threshold.
Upvotes: 3