Reputation: 85
I use Gitlab CI to deploy my service. Currently I have a single deploy job that releases my changes to every server behind a load balancer.
I would like do a phased rollout where I deploy to one server in my load balancer, give it a few minutes to bake and set off any alarms if there is an issue, and then automatically continue deploying to the remaining servers. If any issue occurred before the delayed full automatic deploy happened I would manually cancel that job to prevent the bad change from going out more widely.
With this goal in mind I configured my pipeline with the following .gitlab-ci.yml
:
stages:
- canary_deploy
- full_deploy
canary:
stage: canary_deploy
allow_failure: false
when: manual
script: make deploy-canary
full:
stage: full_deploy
when: delayed
start_in: 10 minutes
script: make deploy-full
This works relatively well but I ran into a problem when I tried to push a critical change out quickly. The canary deploy script was hanging and this prevented the second job from starting as it must wait for the first stage to complete. In this case I would have preferred to skip the canary entirely but because of the way the pipeline is configured it was not possible to manually invoke the full deploy.
Ideally I would like the full_deploy
stage to run on the typical delay but allow me to forcefully start it if I didn't want to wait. I've reviewed the rules
and needs
and when
configuration options hoping to find a way to achieve my goal but I haven't been able to find a working solution.
Some things I've tried, without luck:
full_deploy
job which is manual and does not depend on the canary_deploy
stage but it feels a bit hacky. And in reality my configuration is a bit more complex than what I've distilled here so there are actually several region-specific deploy jobs and I would prefer not to have to duplicate each of them.rules
to consider the status of the prior stage and make the full_deploy
manual unless the prior stage was successful. This isn't possible because rules
are executed on pipeline creation and cannot dynamically adjust this property at runtime.canary_deploy
to allow failure, which effectively unblocked the second stage immediately. The problem here is that it caused the delay timer to start counting down immediately upon pipeline creation rather than waiting for the first stage to complete.Upvotes: 3
Views: 1418
Reputation: 36
One thing you could do to make duplicating the full_deploy
job feel a little bit less "hacky" is to define it once and then use extends
two times:
stages:
- canary_deploy
- full_deploy
.full:
script: make deploy-full
canary:
stage: canary_deploy
allow_failure: false
when: manual
script: make deploy-canary
full_automatic:
extends: .full
stage: full_deploy
when: delayed
start_in: 10 minutes
full_manual:
stage: full_deploy
extends: .full
when: manual
needs: []
This way, you only need to define the scripts
section once and both the full_manual
and the full_automatic
job use it. When running the pipeline, you can choose which job to run first (manual versus canary):
Screenshot of the GitLab UI for selecting which job to run
By specifying needs: []
, you tell GitLab that the full_manual
job does not depend on any other jobs and can be executed immediately without running jobs from canary_deploy
before.
When executing full_manual
, the canary
job is not executed:
Overview of executed pipeline jobs
Upvotes: 2