How to split a 60GB docker layer to achieve better performance?

Question

I'm installing AMD/Xilinx Vivado in an Docker image. Even an almost minimal installation is 60 GB in size resulting in a 29GB compressed image. A full installation is something around 150 GB...

The image is created by:

Using Debian Bookworm (debian:bookworm-slim) from Docker Hub
Adding Python 3.12 (actually using python:3.12-slim from Docker Hub)
Adding necessary packages needed by AMD Vivado via apt install ....
Installing AMD Vivado as an almost minimal setup via ./xsetup ....

When the software is installed, I noticed:

Upload Problems
- dockerd pushes with around 135 Mb/s in a 1GbE network setup.
Download Problems
- dockerd is limited to max 3 download threads in parallel. Compared to docker push, the docker pull achieves 975 Mb/s (max speed in 1GbE). See Docker parallel operations limit
- The downloaded layer is not extracted on the fly. It needs a full download to extract the layer.

I found hints, that splitting big layers into multiple layers will improve performance. So I used a multi-stage build process where Vivado is installed in stage-1 and then stage-2 is used to mount the layers from stage-1 and a RUN --mount... cp -r SRC DEST is used to create 15 layers between 1.5 and 8.8 GB.

The results show this:

For docker push
- dockerd is limited to max 5 upload threads in parallel.
  See Docker parallel operations limit
- The parallel upload results in around 550 Mb/s, so 5x the speed of a single upload thread.
For docker pull
- A download of one big 60 GB layer is equally slow as a download of 15 layers using 3 parallel download threads. A download takes 5 minutes. This process is limited by the 1GbE network setup.
- A 15 layered 60GB image is fast, because finished layers are extracted by another single (!) thread, while other layers are still downloaded. Overall it took 5 minutes download time + 2 minutes to extract the remaining layers after all layers where downloaded.
- A big layered 60GB image is twice as slow, because it downloads the same data in 5 minutes, but then runs a single threaded extraction which takes 8 minutes. Resulting in a total of 13 vs. 7 minutes.

So here is my question:

Who to automatically (scripted) split a layer of e.g. 60 GB into smaller layers of lets say 2 to 6 GBs?

I currently thought about these first ideas:

start the image and execute a script in the container to recursively traverse the Vivado installation directory. By using du -sh, a directory is categorized as:
- ≥8 GB, start another recursion and spit its content
- ≥1 GB, add the directory or single file to a list of layers.
- else, collect all remaining files/directories of small site in a single layer
as a result, a list of lists (or dictionary of lists) is created, where the outer level can be iterated to create image layers and the inner level specifies arguments for RUN ... cp ... to copy files from a previous stage into a new layer.
as RUN commands can't be called in a LOOP in a docker file, a script is needed to write a secondary Dockerfile, which contains n RUN calls.

Manually created Dockerfile to check performance differences:

ARG REGISTRY
ARG NAMESPACE
ARG IMAGE
ARG IMAGE_TAG
ARG VIVADO_VERSION

FROM ${REGISTRY}/${NAMESPACE}/${IMAGE}:${IMAGE_TAG} as base
ARG VIVADO_VERSION

# Install further dependencies for Vivado
RUN --mount=type=bind,target=/context \
    apt-get update \
 && xargs -a /context/Vivado.packages apt-get install -y --no-install-recommends \
 && rm -rf /var/lib/apt/lists/* \
 && apt-get clean


FROM base as monolithic
ARG VIVADO_VERSION

RUN --mount=type=bind,target=/context \
    --mount=type=bind,from=vivado,target=/Install \
    cd /Install; \
    ./xsetup --batch Install -c /context/Vivado.${VIVADO_VERSION}.cfg --agree XilinxEULA,3rdPartyEULA


FROM base
ARG VIVADO_VERSION

RUN mkdir -p /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx

RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/gnu                           /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/ids_lite                      /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/lib                           /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/tps                           /opt/Xilinx/Vivado/${VIVADO_VERSION}
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/secureip                 /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/xsim                     /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/deca                     /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/ip                       /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/simmodels                /opt/Xilinx/Vivado/${VIVADO_VERSION}/data
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/zynquplus   /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/virtexuplus /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/kintexuplus /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -r /Install/data/parts/xilinx/common      /opt/Xilinx/Vivado/${VIVADO_VERSION}/data/parts/xilinx
RUN --mount=type=bind,from=monolithic,source=/opt/Xilinx/Vivado/${VIVADO_VERSION},target=/Install cp -ru /Install/*                            /opt/Xilinx/Vivado/${VIVADO_VERSION}

# Configure Vivado tools
COPY FlexLM.config.sh Vivado.config.sh /tools/GitLab-CI-Scripts/

How to split a 60GB docker layer to achieve better performance?

Answers (0)

Related Questions