Reputation: 1267
I have a docker image which encapsulates some processing steps: A
, B
, C
with a linear dependency: A -> B -> C
. Each step produces some artifacts (files) that will be required for subsequent steps.
What is a robust way of running this pipeline given these constraints?
A simple idea is to write a shell script, running each step like:
# run.sh
python step_a.py [args]
python step_b.py [args]
./step_c [args]
and define run.sh
as the ENTRYPOINT
of the docker image.
Would this be good-enough? What are some potential caveats? Is there a better approach?
I would have preferred something like docker-compose
, but even with depends_on
, it's not guaranteed that subsequent steps will run only after former steps are finished.
Upvotes: 0
Views: 183
Reputation: 264
I think the most robust way to do this with a dockerfile would be to use multi-stage builds.
At its core, multi-stage builds just break up the docker file into multiple smaller images that you can control more granularly; so for your use case, you would have a stage for each part. Then you can copy the artifacts you need between stages. Finally, since you want an output and not a container, you would make the entry point the Rust binary and then have that spit out whatever you need. This would look a little something like this
FROM python-3.8:latest AS stage-1 // this can be whatever image you want
RUN pip install requirements_1.txt // install the reqs for the first python file
RUN python_file_1.py
FROM python-3.8:latest AS stage-2 // again, whatever image you want
RUN pip install requirements_2.txt // same idea
COPY --from=stage-1 ./artifact_1 ./destination // this copies the artifact from running python_file_1.py to somewhere you want it to be. The paths here are obviously placeholders
RUN python_file_2.py
FROM rust:1.31
COPY --from=stage-2 ./artifact_2 ./destination
ENTRYPOINT ["./rust_binary"]
Basic gist -
Upvotes: 1