Ohad Behore
Ohad Behore

Reputation: 1

Hazelcast 5.2.3 - PartitionMigratingException: Partition is migrating - community edition 3 node cluster has errors

I am trying to setup a 3 node hazelcast cluster version 5.2.3. I run it on windows Server 2016 system. In the hazelcast client logs I see this error:

`2023-05-04/14:31:40.170 ERROR ActiveMQ Session Task Failed to create and replace cache com.hazelcast.core.OperationTimeoutException: ClientInvocation{clientMessage = ClientMessage{connection=null, length=538, operation=Map.Set, isRetryable=false, correlationId=1067978, messageType=10f00, isEvent=false, isfragmented=false}, objectName = OBJECTS_CACHE, target = partition 7, sentConnection = null} timed out because exception occurred after client invocation timeout 120000 ms. Current time: 2023-05-04 14:31:40.167. Start time: 2023-05-04 14:27:42.612. Total elapsed time: 237555 ms. at com.hazelcast.client.impl.spi.impl.ClientInvocation.notifyExceptionWithOwnedPermission(ClientInvocation.java:341) at com.hazelcast.client.impl.spi.impl.ClientInvocation.notifyException(ClientInvocation.java:306) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.handleResponse(ClientResponseHandlerSupplier.java:164) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.process(ClientResponseHandlerSupplier.java:141) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier.access$300(ClientResponseHandlerSupplier.java:60) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.accept(ClientResponseHandlerSupplier.java:251) at com.hazelcast.client.impl.spi.impl.ClientResponseHandlerSupplier$DynamicResponseHandler.accept(ClientResponseHandlerSupplier.java:243) at com.hazelcast.client.impl.connection.tcp.TcpClientConnection.handleClientMessage(TcpClientConnection.java:247) at com.hazelcast.client.impl.protocol.util.ClientMessageDecoder.handleMessage(ClientMessageDecoder.java:135) at com.hazelcast.client.impl.protocol.util.ClientMessageDecoder.onRead(ClientMessageDecoder.java:89) at com.hazelcast.internal.networking.nio.NioInboundPipeline.process(NioInboundPipeline.java:137) at com.hazelcast.internal.networking.nio.NioThread.processSelectionKey(NioThread.java:383) at com.hazelcast.internal.networking.nio.NioThread.processSelectionKeys(NioThread.java:368) at com.hazelcast.internal.networking.nio.NioThread.selectLoop(NioThread.java:294) at com.hazelcast.internal.networking.nio.NioThread.executeRun(NioThread.java:249) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) at ------ submitted from ------.() at com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:348) at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:112) at com.hazelcast.client.impl.spi.impl.ClientInvocationFuture.resolveAndThrowIfException(ClientInvocationFuture.java:95) at com.hazelcast.client.impl.spi.impl.ClientInvocationFuture.resolveAndThrowIfException(ClientInvocationFuture.java:40) at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617) at com.hazelcast.client.impl.spi.ClientProxy.invokeOnPartition(ClientProxy.java:188) at com.hazelcast.client.impl.spi.ClientProxy.invoke(ClientProxy.java:182) at com.hazelcast.client.impl.proxy.ClientMapProxy.setInternal(ClientMapProxy.java:690) at com.hazelcast.client.map.impl.nearcache.NearCachedClientMapProxy.setInternal(NearCachedClientMapProxy.java:349) at com.hazelcast.client.impl.proxy.ClientMapProxy.set(ClientMapProxy.java:664) at com.hazelcast.client.impl.proxy.ClientMapProxy.set(ClientMapProxy.java:1556) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheManager.set(HazelcastCacheManager.java:131) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putInCache(HazelcastCacheController.java:272) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putInCache(HazelcastCacheController.java:269) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putObjInCache(HazelcastCacheController.java:248) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.putObjInCache(HazelcastCacheController.java:244) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.replaceCacheObject(HazelcastCacheController.java:189) at com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HazelcastCacheController.replaceCacheObject(HazelcastCacheController.java:179) at com.keshettv.keshetcoreinfra.service.cache.manager.impl.OSCacheConfigManager.replaceCacheObject(OSCacheConfigManager.java:47) at com.keshettv.keshetcoreinfra.service.cache.manager.impl.CacheRefreshService.refreshCache(CacheRefreshService.java:43) at sun.reflect.GeneratedMethodAccessor1591.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.keshettv.keshetcoreinfra.service.messaging.invoker.ServiceInvoker.onMessage(ServiceInvoker.java:117) at org.apache.activemq.ActiveMQMessageConsumer.dispatch(ActiveMQMessageConsumer.java:967) at org.apache.activemq.ActiveMQSessionExecutor.dispatch(ActiveMQSessionExecutor.java:122) at org.apache.activemq.ActiveMQSessionExecutor.iterate(ActiveMQSessionExecutor.java:192) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this: [172.16.12.105]:5701, partitionId: 7, operation: com.hazelcast.map.impl.operation.SetOperation, service: hz:impl:mapService at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.ensureNoPartitionProblems(OperationRunnerImpl.java:412) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.metWithPreconditions(OperationRunnerImpl.java:227) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:263) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:219) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)

`

I have this hazelcast configuration:

`

hazelcast:
  listeners:
    - com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.ClusterMembershipListener
  cluster-name: development
  properties:
    hazelcast.jmx: true
    hazelcast.socket.connect.timeout.seconds: 10
    hazelcast.logging.type: log4j
    hazelcast.jet.enabled: true
  network:
    reuse-address: true
    port:
      auto-increment: true
      port: 5701
    outbound-ports:
      ports: 34500
    join:
      auto-detection:
        enabled: false
      tcp-ip:
        enabled: true
        member-list:
          - [replaced the ip list]
        connection-timeout-seconds: 300
    interfaces:
      enabled: true
      interfaces: 
        - [replaced the ip list]
    ssl:
      enabled: false
      properties:
        protocol: TLSv1.2
        mutualAuthentication: REQUIRED
        keyStore: /opt/hazelcast.keystore
        keyStorePassword: secret.97531
        keyStoreType: PKCS12
        trustStore: /opt/hazelcast.truststore
        trustStorePassword: changeit
        trustStoreType: PKCS12
        keyMaterialDuration: PT10M
    failure-detector:
      icmp:
        enabled: false
        timeout-milliseconds: 1000
        fail-fast-on-startup: true
        interval-milliseconds: 1000
        max-attempts: 2
        parallel-mode: true
        ttl: 255
    symmetric-encryption:
      enabled: false
      algorithm: PBEWithMD5AndDES
      salt: thesalt
      password: thepass
      iteration-count: 19
  executor-service:
    default:
      statistics-enabled: true
      pool-size: 16
      queue-capacity: 0
  durable-executor-service:
    default:
      pool-size: 16
      durability: 1
      capacity: 100
  scheduled-executor-service:
    default:
      pool-size: 16
      durability: 1
      capacity: 100
      capacity-policy: PER_NODE
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
  set:
    default:
      statistics-enabled: false
      backup-count: 1
      async-backup-count: 0
      max-size: 10
  queue:
    default:
      statistics-enabled: true
      max-size: 0
      backup-count: 1
      async-backup-count: 0
      empty-queue-ttl: -1
      queue-store:
        class-name: com.hazelcast.QueueStoreImpl
        properties:
          binary: false
          memory-limit: 1000
          bulk-load: 500
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
  map:
    default:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: ALWAYS
      backup-count: 0
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        max-size-policy: PER_NODE
        size: 0
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      merkle-tree:
        enabled: false
        depth: 10
      event-journal:
        enabled: false
        capacity: 10000
        time-to-live-seconds: 0
    OBJECTS_CACHE:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        #max-size-policy: PER_NODE
        #size: 7000
        max-size-policy: USED_HEAP_PERCENTAGE
        size: 10
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      near-cache:
        in-memory-format: OBJECT
        invalidate-on-change: false
        time-to-live-seconds: 60
        eviction:
          eviction-policy: LRU
          max-size-policy: ENTRY_COUNT
          size: 1000
        cache-local-entries: true
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 10000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: OBJECTS_CACHE
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000
    OBJECTS_CACHE_only_new:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        #max-size-policy: PER_NODE
        #size: 7000
        max-size-policy: USED_HEAP_PERCENTAGE
        size: 10
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      #split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 10000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: OBJECTS_CACHE_only_new
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000
    AXIS_CACHE:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        max-size-policy: PER_NODE
        size: 1000
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      #split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
      near-cache:
        in-memory-format: OBJECT
        invalidate-on-change: false
        time-to-live-seconds: 60
        eviction:
          eviction-policy: LRU
          max-size-policy: ENTRY_COUNT
          size: 1000
        cache-local-entries: true
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 1000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: AXIS_CACHE
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000
    AXIS_CACHE_only_new:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        max-size-policy: PER_NODE
        size: 1000
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      #split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 1000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: AXIS_CACHE_only_new
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000
    HTML_CACHE:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        #max-size-policy: PER_NODE
        #size: 7000
        max-size-policy: USED_HEAP_PERCENTAGE
        size: 10
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      #split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
      near-cache:
        in-memory-format: OBJECT
        invalidate-on-change: false
        time-to-live-seconds: 60
        eviction:
          eviction-policy: LRU
          max-size-policy: ENTRY_COUNT
          size: 1000
        cache-local-entries: true
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 10000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: HTML_CACHE
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000
    HTML_CACHE_only_new:
      in-memory-format: BINARY
      metadata-policy: CREATE_ON_UPDATE
      statistics-enabled: true
      per-entry-stats-enabled: false
      cache-deserialized-values: NEVER
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 0
      max-idle-seconds: 0
      eviction:
        eviction-policy: LRU
        #max-size-policy: PER_NODE
        #size: 7000
        max-size-policy: USED_HEAP_PERCENTAGE
        size: 10
      merge-policy:
        batch-size: 100
        class-name: PutIfAbsentMergePolicy
      read-backup-data: false
      map-store:
        enabled: true
        initial-mode: LAZY
        class-name: com.keshettv.keshetcoreinfra.service.cache.hazelcast.manager.impl.HtmlMapStore
        write-delay-seconds: 60
        write-batch-size: 10000
        write-coalescing: true
        properties:
          connection-string: [Some connection string]
          database-name: hazelcast
          collection-name: HTML_CACHE_only_new
          connections-per-host: 50
          min-connections-per-host: 10
          max-connection-idle-time: 60000
          max-connection-life-time: 120000
          max-wait-time: 5000`

The client is also version 5.2.3. Let me know if any other configuration info is needed. But hazelcast does form a 3 node cluster but one of the nodes doesn't own any partitions. I would like to understand why is that happening and how to fix it?

Upvotes: 0

Views: 638

Answers (1)

Kambaa
Kambaa

Reputation: 485

i'm having similiar issues about this PartitionMigratingExceptions. For a five minute cache(plus three sync backups and 1 async backup configured on the cluster) on my 3 node + management center cluster, it started to give me this error too and it made my endpoints not working.

From what i understand, we need to configure and have a balance between cache timeouts, partitions(key name groupings on a cache node that if any a threshold of inconsistencies between them, there will be a migration), and cluster that we set up... Problem was; there was a migration going on on my node, and new cache setting is not allowed if given upon doing it.

From what i understand, i need to :

  • disable async backup (set async backup to zero) for reducing the chance of long-running migrations due to async backups.
  • lower sync backup count(set sync backup to 1)
  • set a partitioning group configuration
  • on MAP config, add/edit eviction settings(i.e: eviction-percentage which means to evict given value of percentage of entries when the map reaches its maximum size
  • on MAP config, add/edit max-size settings in consideration of memory(max-size-policy and max-size)

After searching for a while, i'm still in the process of getting a working balance.

in my hazelcast config yaml file, i will be adding these settings and test the status

partition-group:
    enabled: true
    group-type: HOST_AWARE
    member-group:
      - 192.168.1.11
      - 192.168.1.12
      - 192.168.1.13
  map:
    my-map:
      backup-count: 1
      async-backup-count: 0
      time-to-live-seconds: 300  # 5 minutes
      max-idle-seconds: 0       # Entries won't expire due to idle time
      eviction-policy: LRU
      eviction-percentage: 25   # Evict 25% of entries when the map reaches its maximum size
      max-size-policy: PER_NODE
      max-size: 10000           # Maximum size of the map per node

another things to consider are these settings:

      # **Enhancements:**
      # 1. Migration interval and threshold:
      migration-interval: 120  # Migrate every 2 minutes (adjust based on workload)
      migration-threshold: 75  # Trigger migration when imbalance exceeds 75% (adjust based on impact tolerance)
      # 2. Near cache:
      near-cache-config:
        time-to-live-seconds: 300 # Cache entries locally for 5 minutes as well
      # 3. Max entries:
      max-entries: 20000 # Set a limit on total entries (optional, adjust based on memory and data volume)
      # 4. Merge interval:
      merge-interval: 300  # Merge smaller entries every 5 minutes (optimize memory usage)

I'm still in the research and testing phase, any updates will be good for anyone who has

Upvotes: 0

Related Questions