Reputation: 1
I am trying to setup a 3 node hazelcast cluster version 5.2.3. I run it on windows Server 2016 system. In the hazelcast client logs I see this error:
`2023-05-04/14:31:40.170 ERROR ActiveMQ Session Task Failed to create and replace cache com.hazelcast.core.OperationTimeoutException: ClientInvocation{clientMessage = ClientMessage{connection=null, length=538, operation=Map.Set, isRetryable=false, correlationId=1067978, messageType=10f00, isEvent=false, isfragmented=false}, objectName = OBJECTS_CACHE, target = partition 7, sentConnection = null} timed out because exception occurred after client invocation timeout 120000 ms. Current time: 2023-05-04 14:31:40.167. Start time: 2023-05-04 14:27:42.612. Total elapsed time: 237555 ms. at com.hazelcast.client.impl.spi.impl.ClientInvocation.notifyExceptionWithOwnedPermission(ClientInvocation.java:341) at com.hazelcast.client.impl.spi.impl.ClientInvocation.notifyException(ClientInvocation.java:306) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.handleResponse(ClientResponseHandlerSupplier.java:164) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.process(ClientResponseHandlerSupplier.java:141) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.access$300(ClientResponseHandlerSupplier.java:60) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.accept(ClientResponseHandlerSupplier.java:251) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.accept(ClientResponseHandlerSupplier.java:243) at com.hazelcast.client.impl.connection.tcp.TcpClientConnection.handleClientMessage(TcpClientConnection.java:247) at com.hazelcast.client.impl.protocol.util.ClientMessageDecoder.handleMessage(ClientMessageDecoder.java:135) at com.hazelcast.client.impl.protocol.util.ClientMessageDecoder.onRead(ClientMessageDecoder.java:89) at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:137) at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:383) at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:368) at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:294) at com.hazelcast.internal.networking.nio.NioThread.executeRun(NioThread.java:249) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) at ------ submitted from ------.() at com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:348) at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:112) at com.hazelcast.client.impl.spi.impl.ClientInvocationFuture.resolveAndThrowIfException(ClientInvocationFuture.java:95) at com.hazelcast.client.impl.spi.impl.ClientInvocationFuture.resolveAndThrowIfException(ClientInvocationFuture.java:40) at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617) at com.hazelcast.client.impl.spi.ClientProxy.invokeOnPartition(ClientProxy.java:188) at com.hazelcast.client.impl.spi.ClientProxy.invoke(ClientProxy.java:182) at com.hazelcast.client.impl.proxy.ClientMapProxy.setInternal(ClientMapProxy.java:690) at com.hazelcast.client.map.impl.nearcache.NearCachedClientMapProxy.setInternal(NearCachedClientMapProxy.java:349) at com.hazelcast.client.impl.proxy.ClientMapProxy.set(ClientMapProxy.java:664) at com.hazelcast.client.impl.proxy.ClientMapProxy.set(ClientMapProxy.java:1556) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheManager.set(HazelcastCacheManager.java:131) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putInCache(HazelcastCacheController.java:272) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putInCache(HazelcastCacheController.java:269) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putObjInCache(HazelcastCacheController.java:248) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putObjInCache(HazelcastCacheController.java:244) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.replaceCacheObject(HazelcastCacheController.java:189) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.replaceCacheObject(HazelcastCacheController.java:179) at com.keshettv.keshetcoreinfra.service.cache.manager.impl.OSCacheConfigManager.replaceCacheObject(OSCacheConfigManager.java:47) at com.keshettv.keshetcoreinfra.service.cache.manager.impl.CacheRefreshService.refreshCache(CacheRefreshService.java:43) at sun.reflect.GeneratedMethodAccessor1591.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.keshettv.keshetcoreinfra.service.messaging.invoker.ServiceInvoker.onMessage(ServiceInvoker.java:117) at org.apache.activemq.ActiveMQMessageConsumer.dispatch(ActiveMQMessageConsumer.java:967) at org.apache.activemq.ActiveMQSessionExecutor.dispatch(ActiveMQSessionExecutor.java:122) at org.apache.activemq.ActiveMQSessionExecutor.iterate(ActiveMQSessionExecutor.java:192) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this: [172.16.12.105]:5701, partitionId: 7, operation: com.hazelcast.map.impl.operation.SetOperation, service: hz:impl:mapService at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.ensureNoPartitionProblems(OperationRunnerImpl.java:412) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.metWithPreconditions(OperationRunnerImpl.java:227) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:263) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:219) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
`
I have this hazelcast configuration:
`
hazelcast:
listeners:
- com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.ClusterMembershipListener
cluster-name: development
properties:
hazelcast.jmx: true
hazelcast.socket.connect.timeout.seconds: 10
hazelcast.logging.type: log4j
hazelcast.jet.enabled: true
network:
reuse-address: true
port:
auto-increment: true
port: 5701
outbound-ports:
ports: 34500
join:
auto-detection:
enabled: false
tcp-ip:
enabled: true
member-list:
- [replaced the ip list]
connection-timeout-seconds: 300
interfaces:
enabled: true
interfaces:
- [replaced the ip list]
ssl:
enabled: false
properties:
protocol: TLSv1.2
mutualAuthentication: REQUIRED
keyStore: /opt/hazelcast.keystore
keyStorePassword: secret.97531
keyStoreType: PKCS12
trustStore: /opt/hazelcast.truststore
trustStorePassword: changeit
trustStoreType: PKCS12
keyMaterialDuration: PT10M
failure-detector:
icmp:
enabled: false
timeout-milliseconds: 1000
fail-fast-on-startup: true
interval-milliseconds: 1000
max-attempts: 2
parallel-mode: true
ttl: 255
symmetric-encryption:
enabled: false
algorithm: PBEWithMD5AndDES
salt: thesalt
password: thepass
iteration-count: 19
executor-service:
default:
statistics-enabled: true
pool-size: 16
queue-capacity: 0
durable-executor-service:
default:
pool-size: 16
durability: 1
capacity: 100
scheduled-executor-service:
default:
pool-size: 16
durability: 1
capacity: 100
capacity-policy: PER_NODE
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
set:
default:
statistics-enabled: false
backup-count: 1
async-backup-count: 0
max-size: 10
queue:
default:
statistics-enabled: true
max-size: 0
backup-count: 1
async-backup-count: 0
empty-queue-ttl: -1
queue-store:
class-name: com.hazelcast.QueueStoreImpl
properties:
binary: false
memory-limit: 1000
bulk-load: 500
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
map:
default:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: ALWAYS
backup-count: 0
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
max-size-policy: PER_NODE
size: 0
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
merkle-tree:
enabled: false
depth: 10
event-journal:
enabled: false
capacity: 10000
time-to-live-seconds: 0
OBJECTS_CACHE:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
#max-size-policy: PER_NODE
#size: 7000
max-size-policy: USED_HEAP_PERCENTAGE
size: 10
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
near-cache:
in-memory-format: OBJECT
invalidate-on-change: false
time-to-live-seconds: 60
eviction:
eviction-policy: LRU
max-size-policy: ENTRY_COUNT
size: 1000
cache-local-entries: true
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 10000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: OBJECTS_CACHE
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000
OBJECTS_CACHE_only_new:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
#max-size-policy: PER_NODE
#size: 7000
max-size-policy: USED_HEAP_PERCENTAGE
size: 10
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
#split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 10000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: OBJECTS_CACHE_only_new
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000
AXIS_CACHE:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
max-size-policy: PER_NODE
size: 1000
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
#split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
near-cache:
in-memory-format: OBJECT
invalidate-on-change: false
time-to-live-seconds: 60
eviction:
eviction-policy: LRU
max-size-policy: ENTRY_COUNT
size: 1000
cache-local-entries: true
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 1000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: AXIS_CACHE
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000
AXIS_CACHE_only_new:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
max-size-policy: PER_NODE
size: 1000
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
#split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 1000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: AXIS_CACHE_only_new
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000
HTML_CACHE:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
#max-size-policy: PER_NODE
#size: 7000
max-size-policy: USED_HEAP_PERCENTAGE
size: 10
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
#split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
near-cache:
in-memory-format: OBJECT
invalidate-on-change: false
time-to-live-seconds: 60
eviction:
eviction-policy: LRU
max-size-policy: ENTRY_COUNT
size: 1000
cache-local-entries: true
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 10000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: HTML_CACHE
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000
HTML_CACHE_only_new:
in-memory-format: BINARY
metadata-policy: CREATE_ON_UPDATE
statistics-enabled: true
per-entry-stats-enabled: false
cache-deserialized-values: NEVER
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 0
max-idle-seconds: 0
eviction:
eviction-policy: LRU
#max-size-policy: PER_NODE
#size: 7000
max-size-policy: USED_HEAP_PERCENTAGE
size: 10
merge-policy:
batch-size: 100
class-name: PutIfAbsentMergePolicy
read-backup-data: false
map-store:
enabled: true
initial-mode: LAZY
class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
write-delay-seconds: 60
write-batch-size: 10000
write-coalescing: true
properties:
connection-string: [Some connection string]
database-name: hazelcast
collection-name: HTML_CACHE_only_new
connections-per-host: 50
min-connections-per-host: 10
max-connection-idle-time: 60000
max-connection-life-time: 120000
max-wait-time: 5000`
The client is also version 5.2.3. Let me know if any other configuration info is needed. But hazelcast does form a 3 node cluster but one of the nodes doesn't own any partitions. I would like to understand why is that happening and how to fix it?
Upvotes: 0
Views: 638
Reputation: 485
i'm having similiar issues about this PartitionMigratingExceptions. For a five minute cache(plus three sync backups and 1 async backup configured on the cluster) on my 3 node + management center cluster, it started to give me this error too and it made my endpoints not working.
From what i understand, we need to configure and have a balance between cache timeouts, partitions(key name groupings on a cache node that if any a threshold of inconsistencies between them, there will be a migration), and cluster that we set up... Problem was; there was a migration going on on my node, and new cache setting is not allowed if given upon doing it.
From what i understand, i need to :
After searching for a while, i'm still in the process of getting a working balance.
in my hazelcast config yaml file, i will be adding these settings and test the status
partition-group:
enabled: true
group-type: HOST_AWARE
member-group:
- 192.168.1.11
- 192.168.1.12
- 192.168.1.13
map:
my-map:
backup-count: 1
async-backup-count: 0
time-to-live-seconds: 300 # 5 minutes
max-idle-seconds: 0 # Entries won't expire due to idle time
eviction-policy: LRU
eviction-percentage: 25 # Evict 25% of entries when the map reaches its maximum size
max-size-policy: PER_NODE
max-size: 10000 # Maximum size of the map per node
another things to consider are these settings:
# **Enhancements:**
# 1. Migration interval and threshold:
migration-interval: 120 # Migrate every 2 minutes (adjust based on workload)
migration-threshold: 75 # Trigger migration when imbalance exceeds 75% (adjust based on impact tolerance)
# 2. Near cache:
near-cache-config:
time-to-live-seconds: 300 # Cache entries locally for 5 minutes as well
# 3. Max entries:
max-entries: 20000 # Set a limit on total entries (optional, adjust based on memory and data volume)
# 4. Merge interval:
merge-interval: 300 # Merge smaller entries every 5 minutes (optimize memory usage)
I'm still in the research and testing phase, any updates will be good for anyone who has
Upvotes: 0