Reputation: 1306
I have Ignite server nodes in my application with the following configuration, and this application is clustered hence there can be multiple ignite servers.
Ignite config looks like this:
@Bean
public Ignite igniteInstance(JdbcIpFinderDialect ipFinderDialect, DataSource dataSource) {
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setGridLogger(new Slf4jLogger());
cfg.setMetricsLogFrequency(0);
TcpDiscoverySpi discoSpi = new TcpDiscoverySpi()
.setIpFinder(new TcpDiscoveryJdbcIpFinder(ipFinderDialect).setDataSource(dataSource)
.setInitSchema(false));
cfg.setDiscoverySpi(discoSpi);
cfg.setCacheConfiguration(cacheConfigurations.toArray(new CacheConfiguration[0]));
cfg.setFailureDetectionTimeout(igniteFailureDetectionTimeout);
return Ignition.start(cfg);
}
But at some point after running it for a day or so, ignite falls over with errors in line with the followings.
o.a.i.spi.discovery.tcp.TcpDiscoverySpi : Node is out of topology (probably, due to short-time network problems
o.a.i.i.m.d.GridDiscoveryManager : Local node SEGMENTED: TcpDiscoveryNode [id=db3eb958-df2c-4211-b2b4-ba660bc810b0, addrs=[10.0.0.1], sockAddrs=[sd-9fdb-a8cb.nam.nsroot.net/10.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1612755975209, loc=true, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]
ROOT : Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SEGMENTATION, err=null]]
o.a.i.i.p.failure.FailureProcessor : Ignite node is in invalid state due to a critical failure.
ROOT : Stopping local node on Ignite failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
o.a.i.i.m.d.GridDiscoveryManager : Node FAILED: TcpDiscoveryNode [id=4d84f811-1c04-4f80-b269-a0003fbf7861, addrs=[10.0.0.1], sockAddrs=[sd-dc95-412b.nam.nsroot.net/10.0.0.1:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1612707966704, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]
o.a.i.i.p.cache.GridCacheProcessor : Stopped cache [cacheName=cacheOne]
o.a.i.i.p.cache.GridCacheProcessor : Stopped cache [cacheName=cacheTwo]
And whenever my applications' client nodes try to write in the server cache they fail with an error,
java.lang.IllegalStateException: class org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed to perform cache operation (cache is stopped): cacheOne
I am looking for a way to restart my Ignite Server node if it fails for such SEGMENTATION faults or any, some suggestions say that I will have to implement AbstractFailureHandler and setFailureHandler as that implementation but failed to find any examples.
Upvotes: 0
Views: 649
Reputation: 19343
You cannot restart an Ignite server node, so if you're using it in a Spring context you need a new context (usually means restarting an application).
Client node will try to reconnect, but if it can't, the same will apply.
Upvotes: 1