Reputation: 1391
I've noticed a strange behaviour of Apache Ignite which occurs fairly reliably on my 5-node Apache Ignite cluster but can be replicated with even a two node cluster. I use Apache Ignite 2.7 for Net in the Linux environment deployed in a Kubernetes cluster (each pod hosts one node).
The problem as follows. Assume we've got a cluster which consists of 2 Apache Ignite nodes, A and B. Both nodes start and initialize. A couple of Ignite Services are deployed on each node during the initialization phase. Among all, a service named QuoteService is deployed on the node B.
So far so good. The cluster works as expected. Then, the node B crashes or gets stopped for whatever reason and then restarts. All the ignite services hosted on the node B get redeployed. The node rejoins the cluster.
However, when a service on the node A is trying to call the QuoteService expected to be available on the node B, an exception gets thrown with the following message: Failed to find deployed service: QuoteService. It is strange as the line registering the service did run during the restart of the node B:
services.DeployMultiple("QuoteGenerator", new Services.Ignite.QuoteGenerator(), 8, 2);
(deploying the service as singleton does not make any difference)
A restart of either node A or node B separately does not help. The problem can only be resolved by shutting down the entire Ignite cluster and restarting all the nodes.
This condition can be reproduced even when 5 nodes are running.
This bug report may look a bit unspecific but it is hard to specify the concrete reproduce steps as the replication involves setting up at least two ignite nodes and stopping and restarting them in a sequence. So let me pose the questions this way: 1. Have you ever noticed such a condition or did you received similar reports from other users? 2. If so, what steps can you recommend to address this problem? 3. Should I wait for the next version of Apache Ignite as I read that the service deployment mechanism is currently being overhauled?
UPD: Getting a similar problem on a running cluster even if I don't stop/start nodes. I will open another question on SA and it seems to have a different genesis.
Upvotes: 1
Views: 759
Reputation: 1391
I've figured out what caused the described behavior (although I don't understand why exactly).
I wanted to ensure that the Ignite service is only deployed on the current node so I used the following C# code to deploy the service:
var services = ignite.GetCluster().ForLocal().GetServices();
services.DeployMultiple("FlatFileService", new Services.Ignite.FlatFileService(), 8, 2);
When I changed my code to rely only on a NodeFilter to limit the deployment of the service to a specific set of nodes and got rid of "GetCluster().ForLocal().", the bug disappeared. The final code is as follows:
var flatFileServiceCfg = new ServiceConfiguration
{
Service = new Services.Ignite.FlatFileService(),
Name = "FlatFileService",
NodeFilter = new ProductServiceNodeFilter(),
MaxPerNodeCount = 2,
TotalCount = 8
};
var services = ignite.GetServices();
services.DeployAll(new[] { flatFileServiceCfg, ... other services... });
It is still strange, however, why the old code did work until the topology changed.
Upvotes: 0