AdamsTips
AdamsTips

Reputation: 1776

How can I persist a docker image instance between stages of a GitLab pipeline?

In the last couple weeks I have been setting up my first pipeline using the public shared runners on GitLab.com for a php project in a private repository. The pipeline is pretty simple at this point, defining two stages:

stages:
  - test
  - deploy

The test stage runs composer update -o to build the project dependencies, connects to a remote database server, and runs the CodeCeption testing framework to test the build and generate code coverage reports.

The deploy stage runs composer update --no-dev -o to rebuild the project with only the production dependencies and uses rsync to push the files to the production webserver.

This is all working right now, but for each stage it runs the whole process of pulling the docker image, installing dependencies, and extracting the project from git. It seems like it would be a whole lot more efficient to just load the docker image and project once, then run the test and deploy stages one after the other using the same persistent build instance.

I realize that many times you do want to create a fresh instance for each stage, but with my project I feel like this is rather inefficient for time and server resources.

I could configure everything to run in the same stage, which would eliminate the redundant docker image process, but I would lose the pipeline functionality in GitLab where you can see which stages failed, and make later stages dependent on the success of the preceding ones.

enter image description here

From my review of the documentation and several related questions, it seems like this might have to do with the the architecture of how this process works, where jobs are independent of each other (and can even be processed by different runners) and are organized into stages on a pipeline.

What I have is certainly workable, (if a little slow) but I thought I would ask the question here in case there was something I was missing that would make this process more efficient while still retaining the CI pipeline functionality.

Upvotes: 14

Views: 6302

Answers (1)

Adam Marshall
Adam Marshall

Reputation: 7725

I know this is an old question, but want to provide an answer for anyone that has the same issue.

There's a config option for the Gitlab Runner application itself that controls when the runner will use a local copy of an image or not. If you manage and user your own runners (even if using gitlab.com) you have full control over these options, but if you use the shared runners provided by Gitlab, you cannot.

Here are the three "pull policies" you can use:

  1. Never. The never pull policy will instruct the runner to never pull images from Docker cloud or another repository, and will only use images already pulled to the Docker host. This allows full control over images and versions used by Gitlab.
  2. If Not Present. The if not present policy instructs the runner to first check if the image is available locally, and if so to use it. Otherwise, it will pull the image from it's repository.
  3. Always. The always policy instructs the runner to ignore any local images, and pull from the repository every time the job runs.

For the shared runners on gitlab.com, the pull policy is set to always to serve the needs of most users. The solution to this issue is to register your own runner(s) for your projects (which you can run in AWS EC2, your laptop/workstation, etc.

Here is the information on available configuration options when running your own Gitlab Runner.

Here are specific details on the available Pull Policies, and when to use them (or not to).

Here is how to register a runner to your projects (or to your entire instance if using self-hosted Gitlab).

Upvotes: 3

Related Questions