Ignore Failed Jobs in Azure DevOps Release Pipeline Status

Question

I've created a release pipeline that includes 3 jobs:

The first job attempts to run a quick deployment via a self-hosted agent deployment group - but this is expected to fail if the agent in that deployment group is offline (VM is shut down) with the error "Unable to deploy to the target '' as the target is offline.".

This failure is instantaneous - before any tasks are even run. As such, the "continueOnError" property does nothing here, because that is a feature of tasks, and doesn't work if the job fails before it ever runs a task.
I have a second job which runs only if the first job fails: an Azure-hosted Agent job which invokes the Azure CLI to start the Agent VM (az vm start ...) via a service principal.
A third job (which also only runs if the first has failed) is a copy of the first job, which is now guaranteed the VM is running.

The overall release has now succeeded as far as I'm concerned, but shows as "Failed" due to the first job failing. Is there any logging command or other trick I can employ to instead have the release appear as "Succeeded" or at least "Succeeded with Warnings"?

Some other things I've tried

Leaving the Agent VM on all the time. This is expensive. We prefer to let it auto-shutdown each evening, and stay off weekends and holidays, and days when we're just doing things other than code churn.
Always checking if the VM is online first

The reason I don't just always run the job to use the Azure CLI to start the Agent VM is that it is slow - even when the Agent VM is already running.

The Azure CLI script must run on the "Microsoft-Hosted" agent pool, sits in a queue anywhere from 1-10 minutes (free tier, shared with other projects), and even once the job gets a host and starts, by the time az vm show --query powerState has been run to check the VM status, several more minutes have passed.

But if the self-hosted VM agent is already running, it responds immediately and Job 1 succeeds within 10 seconds. The VM startup only has to happen once per day, so we'd rather have most CD deployments take 10 seconds and have the first one of the day artificially appear failed than have them all take 3-5 minutes due to the pre-check.
Pre-checking if the agent is online a different way

I've tried using an "agentless job" to invoke the REST API at https://dev.azure.com//_apis/distributedtask/pools//agents/ and get the status property. This works on my machine to tell whether it's already online or not, but I cannot find any way to invoke this request from DevOps pipeline without getting some sort of authentication failure.

The closest I got was using the $(System.AccessToken) (OAuth token) in the request, but then the response the agent gets is different from what I get. It gets the "No agent found for pool with identifier " response one gets when one has access to the organization in question but lacks the roles to see job pool info.

Ignore Failed Jobs in Azure DevOps Release Pipeline Status

Answers (1)

Related Questions