Reputation: 1267

Single entrypoint for pipeline steps in a docker image

I have a docker image which encapsulates some processing steps: A, B, C with a linear dependency: A -> B -> C. Each step produces some artifacts (files) that will be required for subsequent steps.

What is a robust way of running this pipeline given these constraints?

A simple idea is to write a shell script, running each step like:

# run.sh

python step_a.py [args]
python step_b.py [args]
./step_c [args]

and define run.sh as the ENTRYPOINT of the docker image.

Would this be good-enough? What are some potential caveats? Is there a better approach?

I would have preferred something like docker-compose, but even with depends_on, it's not guaranteed that subsequent steps will run only after former steps are finished.

Upvotes: 0

Answers (1)

sami-amer

Reputation: 264

I think the most robust way to do this with a dockerfile would be to use multi-stage builds.

At its core, multi-stage builds just break up the docker file into multiple smaller images that you can control more granularly; so for your use case, you would have a stage for each part. Then you can copy the artifacts you need between stages. Finally, since you want an output and not a container, you would make the entry point the Rust binary and then have that spit out whatever you need. This would look a little something like this

FROM python-3.8:latest AS stage-1 // this can be whatever image you want

RUN pip install requirements_1.txt // install the reqs for the first python file

RUN python_file_1.py

FROM python-3.8:latest AS stage-2 // again, whatever image you want

RUN pip install requirements_2.txt // same idea

COPY --from=stage-1 ./artifact_1 ./destination // this copies the artifact from running python_file_1.py to somewhere you want it to be. The paths here are obviously placeholders

RUN python_file_2.py

FROM rust:1.31

COPY --from=stage-2 ./artifact_2 ./destination

ENTRYPOINT ["./rust_binary"]

Basic gist -

Make some python image, install prereqs for first python file
Run first python file
Makes some python image, install prereqs for second python file
Copy needed artifact from first stage to current (second) stage
Run second python file
Makes some rust image, install anything needed
Copy needed artifact from second stage to current (third) stage
Entrypoint into the rust binary, which should produce your output

Upvotes: 1

Single entrypoint for pipeline steps in a docker image

Answers (1)

Related Questions