staticdev
staticdev

Reputation: 3060

Why multi-stage docker image is bigger than single stage?

I've created a microsservice (https://github.com/staticdev/enelvo-microservice) that needs to clone a git repository to create a docker image, with a single stage Dockerfile the final image has 759MB:

FROM python:3.7.6-slim-stretch

# set the working directory to /app
WORKDIR /app

# copy the current directory contents into the container at /app
COPY . /app

RUN apt-get update && apt-get install -y git \
 && pip install -r requirements.txt \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src \
 && cd enelvo-src \
 && python setup.py install \
 && cd .. \
 && mv enelvo-src/enelvo enelvo \
 && rm -fr enelvo-src

EXPOSE 50051

# run app.py when the container launches
CMD ["python", "app.py"]

I've tried the approach of using a multistage build (https://blog.bitsrc.io/a-guide-to-docker-multi-stage-builds-206e8f31aeb8) to reduce the image size without git and apt-get lists (from update):

FROM python:3.7.6-slim-stretch as cloner

RUN apt-get update && apt-get install -y git \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src

FROM python:3.7.6-slim-stretch

COPY --from=cloner /enelvo-src /app/enelvo-src

# set the working directory to /app
WORKDIR /app

# copy the current directory contents into the container at /app
COPY . /app

RUN pip install -r requirements.txt \
 && cd enelvo-src \
 && python setup.py install \
 && cd .. \
 && mv enelvo-src/enelvo enelvo \
 && rm -fr enelvo-src

EXPOSE 50051

# run app.py when the container launches
CMD ["python", "app.py"]

The problem is that, after doing that, the final size got even bigger (815MB). Any idea of what could be wrong in this case?

Upvotes: 2

Views: 1770

Answers (1)

David Maze
David Maze

Reputation: 159781

In you're first example you're running

RUN git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src \
    ... \
 && rm -fr enelvo-src

and so the enelvo-src tree never exists outside this particular RUN instruction; it's deleted before Docker can build a layer out of it.

In the second example you're running

COPY --from=cloner /enelvo-src /app/enelvo-src
RUN rm -fr enelvo-src

Docker internally creates an image layer after the first step that contains the content of that source tree. The subsequent RUN rm doesn't actually make the image smaller, it just records that the content that was there from the earlier layer technically isn't part of the filesystem any more.

Generally the standard way to use a multi-stage build is to to as much building as you actually can in the earlier stage, and only COPY a final result into your runtime image. For Python packages, one approach that can work well is to build a wheel out of the package:

FROM python:3.7.6-slim-stretch as build
WORKDIR /build
RUN apt-get update && apt-get install -y git \
 && git clone https://github.com/tfcbertaglia/enelvo.git enelvo-src
 && ...
 && python setup.py bdist_wheel  # (not "install")

FROM python:3.7.6-slim-stretch
WORKDIR /app
COPY --from=build /build/dist/wheel/enelvo*.whl .
RUN pip install enelvo*.whl
...

Upvotes: 2

Related Questions