Reputation: 25
I have two Apache Ignite servers with three clients already connected to them, each using separate data region configurations for their respective caches. These three clients work fine, but now, when I connect a fourth client, the node occasionally stops.
WARN 1 --- [vent-worker-#44] o.a.i.i.m.d.GridDiscoveryManager : Node FAILED:
TcpDiscoveryNode [id=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3, consistentId=6ff310ca-dd51-4115-9fdf-fbf3d093b5b3,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, x.y.z.a, 127.0.0.1], sockAddrs=null, discPort=0, order=967,
intOrder=487, lastExchangeTime=1731680665767, loc=false, ver=2.15.0#20230425-sha1:f98f7f35, isClient=true]
Whenever I get this error , my entire Spring boot application is getting restarted from the beginning. Why is this happening and how can I avoid this. Below is my configuration
@Configuration
public class IgniteConfig {
@Bean
public Ignite igniteInstance() {
IgniteConfiguration cfg = new IgniteConfiguration();
cfg.setMetricsLogFrequency(0);
// Set client mode
cfg.setClientMode(true);
// Configure discovery SPI
TcpDiscoverySpi discoverySpi = new TcpDiscoverySpi();
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
ipFinder.setAddresses(Arrays.asList(
"x.y.z.a:47500..47509"
));
discoverySpi.setIpFinder(ipFinder);
discoverySpi.setNetworkTimeout(10000); // Network timeout (5 seconds)
discoverySpi.setJoinTimeout(10000); // Join timeout (10 seconds)
cfg.setDiscoverySpi(discoverySpi);
// Set failure detection timeouts
cfg.setFailureDetectionTimeout(120000); // 120 seconds
cfg.setClientFailureDetectionTimeout(120000); // 120 seconds
// Configure TCP communication SPI
TcpCommunicationSpi spi = new TcpCommunicationSpi();
spi.setConnectTimeout(30000); // Initial connection timeout (3 seconds)
spi.setMaxConnectTimeout(10000); // Max connection timeout (6 seconds)
spi.setReconnectCount(3); // Number of reconnection attempts
spi.setIdleConnectionTimeout(3000); // Idle connection timeout (100 ms)
cfg.setCommunicationSpi(spi);
// Configure event logging to capture node failures and disconnections
cfg.setIncludeEventTypes(
EventType.EVT_NODE_FAILED,
EventType.EVT_NODE_LEFT,
EventType.EVT_NODE_JOINED,
EventType.EVT_NODE_SEGMENTED
);
// Configure event storage for diagnostics
MemoryEventStorageSpi eventStorageSpi = new MemoryEventStorageSpi();
eventStorageSpi.setExpireCount(1000); // Store up to 1000 events in memory
cfg.setEventStorageSpi(eventStorageSpi);
// Set metrics log frequency to zero to reduce logging noise
cfg.setMetricsLogFrequency(0);
// Start the Ignite instance
return Ignition.start(cfg);
}
}
Upvotes: 0
Views: 46
Reputation: 149
Firstly, Why are you using 3 separate data regions. It may very well be fine to do so, but it is pre-dividing your memory space which in and of itself is not any issue unless one of your applications needs to use more than its slice of the pie! If all three were in the same data region then you are only limited by total memory as all consumers draw from the 1 and only pie! In terms of node failure you would need to look at the log file to try to see if there are indications of failure there. I have seen long GC pauses ultimately end up causing a node to crash. I can't say that is your issue, but if your log showed long GC pauses before a crash that you be one example of a node failure reason. Hope that helps.
Upvotes: 1