Best practice for pulling source code and using it in a docker image

Question

A total Docker newbie here.

I have a web application that uses two repositories. One of the repositories is basically a 'client' app, while the second one is the server. The server serves the static files from the client app.

I would like to dockerize the whole thing. In order to do so, now I'm wondering what is the best practice for this:

pull the client and build it inside the image
do the rest

Or

pull the client code in an external bash script
somehow copy the build files to the image
do the rest

Or

pull the client code in an external bash script
never put the client code in the image, use it externally somehow
do the rest

The first approach is working actually, but it seems wasteful since the image is now very big and contains disposal files.

The second approach feels "better" but when I run docker-compose up from the bash script I can't copy the files already, since the script is already running:

#!/bin/bash

git clone ... ~/tmp/client
(cd ~/tmp/client && yarn && yarn build && mv build ~/tmp/build)
docker-compose up
rm -rf ~/tmp/client

As for the third approach I don't even know how to do that.

Any suggestion or reference would be very helpful.

Andreas J&#228;gle · Accepted Answer

Great question! Even though there are several ways to solve this, there are quite some differences and drawbacks with some of these approaches. Back in the days the pattern was basically to build stuff outside (on the host) and then copy the relevant things into the image if you wanted to avoid having all the SDKs and sources in your production image.

Luckily there are better ways to solve this today: multistage docker builds.

A multistage Dockerfile is like a regular Dockerfile but it contains several stages (aka more than one FROM statement). Each stage is a fresh start of an image build. Not all images might end up in your container registry as some of them are just used to trigger intermediate build steps.

Pseudo code

FROM node:version AS frontend-build
WORKDIR /src
COPY src/frontend . # or better package.json/package-lock.json first, then install, then the rest
RUN npm ci # or yarn build

FROM jdk-plus-buildtool:version AS backend-build
WORKDIR /app
COPY src/backend .
RUN mvn package # or similar

FROM trimmed-down-runtime:version
WORKDIR /app
COPY --from=backend-build target/myapp/ .
COPY --from=frontend-build dist/ ./static-files-folder
CMD your-run-command # or entrypoint

Using this approach has several advantages:

Your final image will contain only the minimal dependencies needed to run your application (e.g. JRE, java application, static javascript files)
Nothing is build outside a container which limits the effects of the environment on the build. Every tool required must be available in the build container, which makes the builds pretty reliable and reproducible
The build can easily be run on a developer machine producing the same results even though the developer might have different versions of npm/java locally on their machine
No build tools, sdks, source files or intermediate artifacts end up in your final image
Even the backend part itself can become smaller because you no longer ship the SDK (e.g. JDK for a java app) when moving those into a production container
You can leverage the docker build cache even more because whole parts can be skipped if nothing changed (e.g. reuse the java build if only javascript files changed)
You have more fine-grained control over the dependencies used in each build step and the build itself has less inter-dependencies as the steps for the different technologies are running in different containers.

If you are talking about a static javascript application and an HTTP API backend server, you could also use two separate images (frontend and backend) and then set up network and proxying accordingly so that you only expose the frontend container to the world and all requests are routed through the frontend to the backend application.

One more comment: You are talking about different repositories for client and server. Usually the CI environment cares about checking out the desired versions of your code before the real build starts. If this server is basically used from this one client only, I would use the bundled approach and also move the client sources into a subfolder of the main server repository. This makes it easier to do bugfixes for the whole system with a single bugfix branch. If you really cannot move source code between repositories, I would go with some git submodule/subtree approach to avoid dealing with commit references on my own during the build.

Best practice for pulling source code and using it in a docker image

Answers (1)

Related Questions