Reputation: 21
I'm encountering an issue with an embedded Infinispan clustered cache setup in my application. Following is the scenario,
Setup Overview
I have a distributed setup with 2 nodes running Infinispan clustered cache embedded in a Spring Boot application. The nodes communicate over a network to form a cluster and share cache data.
Problem Description
When there is a network disconnection between nodes, they get disconnected and form split clusters.
2024-05-07 03:39:29,677 INFO o.i.r.t.j.JGroupsTransport [VERIFY_SUSPECT.TimerThread-92,node2] ISPN000094: Received new cluster view for channel my-cluster: [node2|2] (1) [node2]
2024-05-07 03:39:29,678 INFO o.i.u.l.e.i.BasicEventLogger [VERIFY_SUSPECT.TimerThread-92,node2] ISPN100001: Node node1 left the cluster
After some time, when the network connection is restored, the nodes reconnect as subgroups within the cluster.
2024-05-07 03:40:19,272 INFO o.i.r.t.j.JGroupsTransport [jgroups-82,node2] ISPN000093: Received new, MERGED cluster view for channel my-cluster: MergeView::[node1|3] (2) [node1, node2], 2 subgroups: [node2|2] (1) [node2], [node1|2] (1) [node1]
2024-05-07 03:40:19,273 INFO o.i.u.l.e.i.BasicEventLogger [jgroups-82,node2] ISPN100000: Node node1 joined the cluster
However, after this event, I noticed that the Infinispan cache event listener on one of the nodes (This happens randomly, node1 or node2) stops working without logging any errors.
Additional Information
Following is the cache event listener implementation. It worked fine before the network disconnection event.
@Listener(clustered = true)
public class CacheEventListener {
private static final Logger LOGGER = LogManager.getLogger(CacheEventListener.class);
@CacheEntryCreated
public void entryCreated(CacheEntryCreatedEvent event) {
LOGGER.info("Cache created event for {}", event.getCache().getName());
}
@CacheEntryModified
public void entryModified(CacheEntryModifiedEvent event) {
LOGGER.info("Cache modified event for {}", event.getCache().getName());
}
}
There are no logged errors or exceptions related to the cache event listener or Infinispan configuration.
Following is my Jgroups configuration.
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-4.2.xsd">
<!-- jgroups.tcp.address is deprecated and will be removed, see ISPN-11867 -->
<TCP bind_addr="${jgroups.bind.address,jgroups.tcp.address:SITE_LOCAL}"
bind_port="${jgroups.bind.port,jgroups.tcp.port:7805}"
client_bind_port="7004"
enable_diagnostics="false"
thread_naming_pattern="pl"
send_buf_size="640k"
sock_conn_timeout="300"
bundler_type="no-bundler"
thread_pool.min_threads="${jgroups.thread_pool.min_threads:2}"
thread_pool.max_threads="${jgroups.thread_pool.max_threads:200}"
thread_pool.keep_alive_time="60000"
thread_dumps_threshold="${jgroups.thread_dumps_threshold:10000}"
/>
<TCPPING async_discovery="true"
initial_hosts="192.168.1.7[7805]"
port_range="0"
/>
<MERGE3 min_interval="10000"
max_interval="30000"
/>
<FD_SOCK/>
<!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
<FD_ALL timeout="10000"
interval="2000"
timeout_check_interval="1000"
/>
<VERIFY_SUSPECT timeout="1000"/>
<pbcast.NAKACK2 use_mcast_xmit="false"
xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
resend_last_seqno="true"
/>
<UNICAST3 xmit_interval="100"
xmit_table_num_rows="50"
xmit_table_msgs_per_row="1024"
xmit_table_max_compaction_time="30000"
/>
<pbcast.STABLE stability_delay="500"
desired_avg_gossip="5000"
max_bytes="1M"
/>
<pbcast.GMS print_local_addr="false"
join_timeout="${jgroups.join_timeout:2000}"
/>
<UFC max_credits="4m"
min_threshold="0.40"
/>
<MFC max_credits="4m"
min_threshold="0.40"
/>
<FRAG3/>
</config>
Questions
Upvotes: 2
Views: 211