service fabric unhealthy service affect other services

Question

I'm trying to understand service fabric logic to consider a node in a cluster as unhealthy.

I recently deployed a new version of our application that had 3 unhealthy worker services running on all nodes, they are very light services loading messages from a queue, but because their frequent failures, all other services running on same node were affected by some reason, so all services are reported as unhealthy.

I assume this behavior is a service fabric health monitoring thinking the node is not healthy because multiple services are failing on same node. Is this right?

What is the measures that SF uses to consider a node as unhealthy.

masnider · Accepted Answer

Service Fabric's health model is described in detail here. The measures are always "health reports". Service Fabric emits some health reports on its own, but the model is also extensible and you can add your own.

Regardless of whether you've added any new health reports or are relying only on what is present in the system by default, then you can see what health reports are being emitted for a given node by either selecting the node specifically within SFX or by running a command like the following:

Get-ServiceFabricNodeHealth -NodeName Node1

As we saw in the doc, Node health is mainly determined by

Health Reports against that particular node (ex: Node went down)
Failures of a Deployed Application
Failures of a particular Deployed Service Package (usually the code packages within in)

In these cases SF tries to grab as much information about what failed (exit codes, exceptions and their stack traces, etc) and reports a health warning or error for that node.

service fabric unhealthy service affect other services

Answers (1)

Related Questions