Reputation: 746
We are running version 2.4 & using Spring IgniteSpringBean & Spring Data repositories for cluster & cache access.
Since we have been having a lot of IgniteClientDisconnectedException related issues, i am writing a manual segmentation resolver (by disabling automatic client reconnection with clientReconnectDisabled set to true) which would detect this condition (using a simple cache query that runs periodically) & initiate a disconnect via IgniteSpringBean#close followed by a reconnect with the below code fragment (very similar to this discussion, http://apache-ignite-users.70518.x6.nabble.com/SPI-has-already-been-started-always-create-new-configuration-instance-for-each-starting-Ignite-instar-td7360.html),
Code fragment in bean DCMIgniteSpringBean#reconnect() referenced below in XML config:
public final void reconnect(final IgniteConfiguration specifiedIgniteConfiguration) {
LOGGER.info("Initiating reconnect..");
try {
close();
//destroy();
} catch (Exception e) {
LOGGER.warn("Error while disconnecting", e);
}
LOGGER.info("Disconnected..");
try {
Thread.sleep(1000);
} catch (Exception e) {
LOGGER.warn("Error while pausing to reconnect", e);
}
setConfiguration(specifiedIgniteConfiguration);
afterSingletonsInstantiated();
final CacheConfiguration[] cfgArray = specifiedIgniteConfiguration.getCacheConfiguration();
LOGGER.info("Cache configuration is : {}", cfgArray);
getOrCreateCaches(Arrays.asList(cfgArray));
LOGGER.info("Reconnected..");
}
The XML bean config fragment:
<bean id="igniteInstance" class="com.brocade.dcm.configuration.DCMIgniteSpringBean">
<property name="configuration" ref="grid.cfg"/>
</bean>
<bean id="grid.cfg.provider" class="com.brocade.dcm.configuration.ClientHealthBasedReconnectWrapper">
<lookup-method name="createIgniteConfiguration" bean="grid.cfg"/>
</bean>
<bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration" scope="prototype">
...
...
</bean>
With the above i got this to work & see that my extended IgniteSpringBean client reconnects properly & starts all the caches as-well.
However the problem is even though the client is connected & the caches are started all subsequent calls/queries to any of the IgniteCache & IgniteRepository instances fail with CacheStoppedException (below) & are rendered unusable.
Can someone suggest what i could do to refresh these references. I know that when the client reconnects automatically post a disconnect the references continue to work fine which tells me there is a way to refresh them & that i am not doing it.
Any expert ideas on how to achieve this...feels like i am close but still far given that i am doing hacks :-(
Below are the exceptions i get for IgniteCache#query() & IgniteRepository#findByXXX() calls respectively,
class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): FabricInfoCache
at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1684)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:365)
at com.brocade.dcm.configuration.ClientHealthBasedReconnectWrapper.monitorHealth(ClientHealthBasedReconnectWrapper.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
.
[Request processing failed; nested exception is java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): WebsocketCacheInfo] with root cause
class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): WebsocketCacheInfo
at org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:164)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1684)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:365)
at org.apache.ignite.springdata.repository.query.IgniteRepositoryQuery.execute(IgniteRepositoryQuery.java:117)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.doInvoke(RepositoryFactorySupport.java:483)
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.invoke(RepositoryFactorySupport.java:461)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.data.projection.DefaultMethodInvokingMethodInterceptor.invoke(DefaultMethodInvokingMethodInterceptor.java:61)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.data.repository.core.support.SurroundingTransactionDetectorMethodInterceptor.invoke(SurroundingTransactionDetectorMethodInterceptor.java:57)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
at com.sun.proxy.$Proxy182.findByWebsocketSessionId(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:333)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:207)
at com.sun.proxy.$Proxy124.findByWebsocketSessionId(Unknown Source)
Thanks Muthu
Upvotes: 0
Views: 378
Reputation: 746
For others facing this issues, i fixed the problem by building from source & fixing the code in GatewayProtectedCacheProxy#checkProxyIsValid & GridCacheContext.
Special thanks to @Michael for sharing the related issue which helped get to this solution.
Basically i see that when ignite is stopped & restarted the wrapped cache proxy references (4 IgniteCache/IgniteRepository) that have been served before have their kernel context going stale as the kernel is stopped & restarted with a new instance. The (spring) application has these references (from various injections) & their subsequent calls with them fail. The fix was to check if there is an existing running kernel instance/reference for the same ignite instance name & if so update the proxy references if a cache with the same name has been started & available.
private GridCacheGateway<K, V> checkProxyIsValid(@Nullable GridCacheGateway<K, V> gate, boolean tryRestart) {
..
..
if (isCacheProxy && tryRestart && gate.isStopped() &&
context().kernalContext().gateway().getState() == GridKernalState.STOPPED) {
IgniteKernal igniteKernal = (IgniteKernal) Ignition.ignite(context().gridConfig().getIgniteInstanceName());
if(igniteKernal != null) {
context().setGridKernalContext(igniteKernal.context());
}
}
if (isCacheProxy && tryRestart && gate.isStopped() &&
context().kernalContext().gateway().getState() == GridKernalState.STARTED) {
IgniteCacheProxyImpl proxyImpl = (IgniteCacheProxyImpl) delegate;
try {
IgniteInternalCache<K, V> cache = context().kernalContext().cache().<K, V>publicJCache(context().name()).internalProxy();
GridFutureAdapter<Void> fut = proxyImpl.opportunisticRestart();
if (fut == null)
proxyImpl.onRestarted(cache.context(), cache.context().cache());
else
new IgniteFutureImpl<>(fut).get();
return gate();
} catch (IgniteCheckedException ice) {
// Opportunity didn't work out.
}
}
return gate;
}
/**
* NOTE : This method goes into GridCacheContext.java
* @param ctx
*/
public void setGridKernalContext(GridKernalContext ctx) {
this.ctx = ctx;
}
Upvotes: 0
Reputation: 650
I believe this should be fixed in 2.5:
https://issues.apache.org/jira/browse/IGNITE-2766
Please try this version.
Upvotes: 1