Alexandru Dinu
Alexandru Dinu

Reputation: 1267

Single entrypoint for pipeline steps in a docker image

I have a docker image which encapsulates some processing steps: A, B, C with a linear dependency: A -> B -> C. Each step produces some artifacts (files) that will be required for subsequent steps.

What is a robust way of running this pipeline given these constraints?


A simple idea is to write a shell script, running each step like:

# run.sh

python step_a.py [args]
python step_b.py [args]
./step_c [args]

and define run.sh as the ENTRYPOINT of the docker image.

Would this be good-enough? What are some potential caveats? Is there a better approach?

I would have preferred something like docker-compose, but even with depends_on, it's not guaranteed that subsequent steps will run only after former steps are finished.

Upvotes: 0

Views: 183

Answers (1)

sami-amer
sami-amer

Reputation: 264

I think the most robust way to do this with a dockerfile would be to use multi-stage builds.

At its core, multi-stage builds just break up the docker file into multiple smaller images that you can control more granularly; so for your use case, you would have a stage for each part. Then you can copy the artifacts you need between stages. Finally, since you want an output and not a container, you would make the entry point the Rust binary and then have that spit out whatever you need. This would look a little something like this

FROM python-3.8:latest AS stage-1 // this can be whatever image you want

RUN pip install requirements_1.txt // install the reqs for the first python file

RUN python_file_1.py

FROM python-3.8:latest AS stage-2 // again, whatever image you want

RUN pip install requirements_2.txt // same idea

COPY --from=stage-1 ./artifact_1 ./destination // this copies the artifact from running python_file_1.py to somewhere you want it to be. The paths here are obviously placeholders

RUN python_file_2.py

FROM rust:1.31

COPY --from=stage-2 ./artifact_2 ./destination

ENTRYPOINT ["./rust_binary"]

Basic gist -

  1. Make some python image, install prereqs for first python file
  2. Run first python file
  3. Makes some python image, install prereqs for second python file
  4. Copy needed artifact from first stage to current (second) stage
  5. Run second python file
  6. Makes some rust image, install anything needed
  7. Copy needed artifact from second stage to current (third) stage
  8. Entrypoint into the rust binary, which should produce your output

Upvotes: 1

Related Questions