Reputation: 31
We are using Hazelcast 3.10.4 in our project. The second level caching is also enabled. Sometimes, we are getting an exception mentioned below with 3 Hazelcast nodes : 2 Nodes on same machine and 1 on different machine. But network is same. Exception occurs on one of the 2 nodes. Not on the node running on another machine.
We are getting this, when we start the servers and then do some operation.
Case: Consider Cache-1 and Cache-2 both distributed hazelcast caches ( hazelcast IMap). Cache-1 is initialized fully at server start up. Once the servers are up and running, Cache-1 is used only for getting data - no update, no reloading happens. Cache-1 contains all the data required for processing. Say, Cache-1 contains all system configuration defined at different levels which user is not going to change once the system is up. This cache is distributed across 3 nodes.
Cache-2 is initialized partially at server start up. Once the servers are up and running, system is modifying Cache-2 : deleting data from cache, modifying data in cache, loading data for a source if not present in cache based on some configurations which exists in Cache-1 (partially on demand loading).
Please note: Same check happens when loading data in Cache-2 during server start up. But no failure at server start up.
We are getting below exception when system tries to retrieve the value from Cache-1:
java.lang.IllegalThreadStateException: Thread[hz._hzInstance_1_tpt-val-js-master.partition-operation.thread-1,5,main] cannot make remote call: com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService', identityHash=1023062798, partitionId=234, replicaIndex=0, callId=0, invocationTime=-1 (1969-12-31 23:59:59.999), waitTimeout=-1, callTimeout=1800000, name=ConfigParamMap}
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:523)
at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:215)
at com.hazelcast.spi.impl.operationservice.impl.InvocationBuilderImpl.invoke(InvocationBuilderImpl.java:60)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:424)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:347)
at com.hazelcast.map.impl.proxy.NearCachedMapProxyImpl.getInternal(NearCachedMapProxyImpl.java:114)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:116)
at com.tpt.atlant.grid.dao.hazelcast.DataHazelcastDAO.getData(DataHazelcastDAO.java:84)
at com.tpt.atlant.configparam.service.ConfigParamServiceImpl.getConfigParam(ConfigParamServiceImpl.java:33)
at com.tpt.atlant.configparam.service.ConfigParamServiceImpl.getBooleanValue(ConfigParamServiceImpl.java:52)
at com.tpt.atlant.configparam.service.ConfigParamServiceImpl$$FastClassBySpringCGLIB$$b698afe3.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
at org.springframework.cache.interceptor.CacheInterceptor$1.invoke(CacheInterceptor.java:52)
at org.springframework.cache.interceptor.CacheAspectSupport.invokeOperation(CacheAspectSupport.java:345)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:414)
at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327)
at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655)
at com.tpt.atlant.configparam.service.ConfigParamServiceImpl$$EnhancerBySpringCGLIB$$401b9858.getBooleanValue(<generated>)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.isBasisPl(TermDataJDBILoader.java:411)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.prepareInClauseFromList(TermDataJDBILoader.java:385)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.getCmdtyInstruments(TermDataJDBILoader.java:381)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.loadTermAttributes(TermDataJDBILoader.java:353)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.load(TermDataJDBILoader.java:263)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader.load(TermDataJDBILoader.java:1)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader$$FastClassBySpringCGLIB$$f032653.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:651)
at com.tpt.trade.grid.dao.loader.jdbi.TermDataJDBILoader$$EnhancerBySpringCGLIB$$b6e9411d.load(<generated>)
at com.tpt.atlant.grid.dao.loader.hazelcast.HazelcastMapLoader.load(HazelcastMapLoader.java:62)
at com.tpt.atlant.grid.dao.loader.hazelcast.HazelcastMapLoader.load(HazelcastMapLoader.java:33)
at com.tpt.atlant.grid.util.hazelcast.SpringContextMapLoaderFactory$DynamicSpringContectLoaderResolver.load(SpringContextMapLoaderFactory.java:223)
at com.tpt.atlant.grid.util.hazelcast.SpringContextMapLoaderFactory$DynamicSpringContectLoaderResolver.load(SpringContextMapLoaderFactory.java:179)
at com.hazelcast.map.impl.MapStoreWrapper.load(MapStoreWrapper.java:165)
at com.hazelcast.map.impl.mapstore.writethrough.WriteThroughStore.load(WriteThroughStore.java:72)
at com.hazelcast.map.impl.mapstore.writethrough.WriteThroughStore.load(WriteThroughStore.java:28)
at com.hazelcast.map.impl.recordstore.DefaultRecordStore.loadRecordOrNull(DefaultRecordStore.java:415)
at com.hazelcast.map.impl.recordstore.DefaultRecordStore.get(DefaultRecordStore.java:626)
at com.hazelcast.map.impl.operation.GetOperation.run(GetOperation.java:41)
at com.hazelcast.spi.Operation.call(Operation.java:148)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:202)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:191)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:120)
at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:100)
at ------ submitted from ------.(Unknown Source)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:127)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:162)
at com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:425)
at com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:347)
at com.hazelcast.map.impl.proxy.NearCachedMapProxyImpl.getInternal(NearCachedMapProxyImpl.java:114)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:116)
at com.tpt.trade.grid.dao.hazelcast.TermDataNumberHazelcastDAO.getTermData(TermDataNumberHazelcastDAO.java:54)
at com.tpt.valuation.position.grid.value.action.BatchTermValueAction.execute(BatchTermValueAction.java:107)
Hazelcast code from where this exception was thrown is checking : whether partition id of operation key matches with partition thread id. If not, then throw exception. Lot of other code also there.
Can anyone please let me know the root cause of this? If there is an issue with cache design, then this exception should occur every time. Not able to reproduce on developer machine with 2 or 3 nodes. But re-producible sometimes on test machines.
Update : Added full stack trace. We implemented MapLoader to load the data into cache.
Reason might be: During write through operation, system is trying to access the data from different cache having different key.
This logic is 1 year old and working fine. We started getting this exception recently only in test environment.
Upvotes: 3
Views: 1719
Reputation: 4717
There is a bug in Hazelcast which may throw this exception in some cases https://github.com/hazelcast/hazelcast/issues/24712
Upvotes: 0
Reputation: 31
We are able to reproduce the issue.
We are maintaining second level cache. Because we are keeping data of cache1 keys in second level cache during processing, this issue is not occurring every time. We just cleared the second level cache before every cache2.get(key) for reproducing issue.
If no data found for key in cache2 hazelcast map, then system will call load method of MapLoader. Here partition thread comes into picture and operation is write through operation.
In load method of Cache2 MapLoader, we are trying to access data from cache-1 having different key type. As key type of Cache-1 and Cache-2 different, partition mismatch occurs and this results in "cannot make remote call" exception.
Question: Is there any way to achieve this if this is really required?
Solution-1: We used IExecutorService in load method of Cache2 MapLoader to access the value from Cache1. This is working fine. But this may increase data loading time as extra thread involved.
Solution-2: Set backup count for Cache1 = Number of Nodes -1. This way,we have copy of map on each node. This exception never occur as there are very less changes of any change in Cache1 at runtime time (once a week). Number of records also not more than 1000.
Is there any other way to achieve this?
Upvotes: 0