Reputation: 11361
I'm trying to understand service fabric logic to consider a node in a cluster as unhealthy.
I recently deployed a new version of our application that had 3 unhealthy worker services running on all nodes, they are very light services loading messages from a queue, but because their frequent failures, all other services running on same node were affected by some reason, so all services are reported as unhealthy.
I assume this behavior is a service fabric health monitoring thinking the node is not healthy because multiple services are failing on same node. Is this right?
What is the measures that SF uses to consider a node as unhealthy.
Upvotes: 1
Views: 1167
Reputation: 2599
Service Fabric's health model is described in detail here. The measures are always "health reports". Service Fabric emits some health reports on its own, but the model is also extensible and you can add your own.
Regardless of whether you've added any new health reports or are relying only on what is present in the system by default, then you can see what health reports are being emitted for a given node by either selecting the node specifically within SFX or by running a command like the following:
Get-ServiceFabricNodeHealth -NodeName Node1
As we saw in the doc, Node health is mainly determined by
In these cases SF tries to grab as much information about what failed (exit codes, exceptions and their stack traces, etc) and reports a health warning or error for that node.
Upvotes: 1