Reputation: 329
I have a hazelcast cluster on two servers, with two nodes on each server. I am using the hazelcast jar so the nodes on each server restart each time the server is restarted. Whenever a deployment happens to the two servers, they restart with a 30 second difference. I had one particular instance where every time the application requests a particular piece of data, this exception is thrown. I am using a MultiMap for caching data.
Caused by: com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this:Address[app01]:5701, partitionId: 0, operation: com.hazelcast.map.impl.operation.PutOperation, service: hz:impl:mapService
at com.hazelcast.spi.impl.BasicOperationService$OperationHandler.ensureNoPartitionProblems(BasicOperationService.java:833)
at com.hazelcast.spi.impl.BasicOperationService$OperationHandler.handle(BasicOperationService.java:741)
at com.hazelcast.spi.impl.BasicOperationService$OperationHandler.access$500(BasicOperationService.java:725)
at com.hazelcast.spi.impl.BasicOperationService$BasicDispatcherImpl.dispatch(BasicOperationService.java:576)
at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.process(BasicOperationScheduler.java:466)
at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.doRun(BasicOperationScheduler.java:458)
at com.hazelcast.spi.impl.BasicOperationScheduler$OperationThread.run(BasicOperationScheduler.java:432)
I see several of these after the exception.
2015-04-10 14:51:03,403 WARN com.hazelcast.spi.impl.BasicInvocation - [app01]:5701 [dev] [3.4.2] Retrying invocation: BasicInvocation{ serviceName='hz:impl:mapService', op=PutOperation{alert-coms}, partitionId=0, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=100, callTimeout=60000, target=Address[app01]:5701, backupsExpected=0, backupsCompleted=0}, Reason: com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this:Address[app01]:5701, partitionId: 0, operation: com.hazelcast.map.impl.operation.PutOperation, service: hz:impl:mapService
I understand that it is retrying the same operation as an exception was thrown, but the problem is that the partition migration did not complete for a whole weekend till I restarted the servers again.
Please help me understand why this is happening and what measures could be taken to prevent it.
Thanks.
Upvotes: 5
Views: 1916
Reputation: 56
You don't say what version you are using. We are aware of some issues with partition migration. In 3.7 we completely reworked the partition migration scheme from scratch.
So try 3.7.
Upvotes: 3