Declan McNulty
Declan McNulty

Reputation: 3374

How to stop an idle Service Fabric Cluster Upgrade?

I have a service fabric cluster that seems to be stuck in the roll back phase of an automatic upgrade for over seven days.

This is the output from Get-ServiceFabricClusterUpgrade:

TargetCodeVersion             : 5.5.216.0
TargetConfigVersion           : 2
StartTimestampUtc             : 15/06/2017 23:44:40
FailureTimestampUtc           : 16/06/2017 01:41:48
FailureReason                 : HealthCheck
UpgradeState                  : RollingBackInProgress
UpgradeDuration               : 7.14:13:10
CurrentUpgradeDomainDuration  : 7.12:16:03
CurrentUpgradeDomainProgress  : 0

NodeName            : xxxxxxxxxxxxxxxxxxxxx
UpgradePhase        : PreUpgradeSafetyCheck
PendingSafetyChecks :
WaitForInbuildReplica - PartitionId: xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
NextUpgradeDomain             : 1
UpgradeDomainsStatus          : { "0" = "InProgress";
                                  "1" = "Pending";
                                  "2" = "Pending";
                                  "3" = "Pending";
                                  "4" = "Pending" }

The only other cmdlets under the Service Fabric powershell module that seem related are Start-ServiceFabricClusterUpgrade, Resume-ServiceFabricClusterUpgrade and Update-ServiceFabricClusterUpgrade.

I have tried Start-ServiceFabricClusterUpgrade with the -Force switch hoping it would cancel the existing hanging one, and start a new one but unfortunately not. I have also restarted the node that is in progress but that has made no difference either.

In the absence of a Stop-ServiceFabricClusterUpgrade, is there anything else I can do to stop this process?

Upvotes: 5

Views: 3907

Answers (3)

Declan McNulty
Declan McNulty

Reputation: 3374

What I did in the end was log onto the nodes in the cluster one by one and restart them, waiting for the previous one to come back up before restarting the next one.

This fixed it and the upgrade process eventually finished. The restart on the VMSS would probably have achieved the same thing, but I'm not sure whether there would have been a service outage during the restart. It certainly would have been less time consuming.

Upvotes: 2

Kiryl
Kiryl

Reputation: 1526

Troubleshoot application upgrades says that -

"An UpgradePhase of PreUpgradeSafetyCheck means there were issues preparing the upgrade domain before it was performed.The most common issues in this case are service errors in the close or demotion from primary code paths."

So probably SF was not able to shut down service executable. The easiest way might be to Deactivate(restart) the node mentioned in the output from the SF Explorer.

Upvotes: 2

The Muffin Man
The Muffin Man

Reputation: 20004

Two ways that I can see you accomplishing this:

  • Kill the service fabric cluster and recreate it
  • or preferably restart the Virtual Machine Scale Set (really the equivalent of restarting the servers). I'm sure there's a way to do this through Powershell instead of through the Azure portal.

enter image description here

Upvotes: 1

Related Questions