Reputation: 228

Docker workflow for scientific computing

I'm trying to imagine a workflow that could be applied on a scientific work environment. My work involves doing some scientific coding, basically with Python, pandas, numpy and friends. Sometimes I have to use some modules that are not common standards in the scientific community and sometimes I have to integrate some compiled code in my chain of simulations. The code I run is most of the time parallelised with IPython notebook.

What do I find interesting about docker?

The fact that I could create a docker containing my code and its working environment. I can then send the docker to my colleges, without asking them to change their work environment, e.g., install an outdated version of a module so that they can run my code.

A rough draft of the workflow I have in mind goes something as follows:

Develop locally until I have a version I want to share with somebody.
Build a docker, possibly with a hook from a git repo.
Share the docker.

Can somebody give me some pointers of what I should take into account to develop further this workflow? A point that intrigues me: code running on a docker can lunch parallel process on the several cores of the machine? e.g., an IPython notebook connected to a cluster.

Upvotes: 3

Answers (2)

jaketbouma

Reputation: 141

Even though you'll have a full container, I think a package manager like conda can still be a solid part of the base image for your workflow.

FROM ubuntu:14.04
RUN apt-get update && apt-get install curl -y

# Install miniconda
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
RUN bash Miniconda-latest-Linux-x86_64.sh -p /miniconda -b
RUN rm Miniconda-latest-Linux-x86_64.sh
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda

* from nice example showing docker + miniconda + flask

Wrt doing source activate <env> in the Dockerfile you need to:

RUN /bin/bash -c "source activate <env> && <do something in the env>"

Upvotes: 1

Regan

Reputation: 8781

Docker can launch multiple process/thread on multiple core. Multiple processes may need the use of a supervisor (see : https://docs.docker.com/articles/using_supervisord/ )

You should probably build an image that contain the things you always use and use it as a base for all your project. (Would save you the pain of writing a complete Dockerfile each time)

Why not develop directly in a container and use the commit command to save your progress on a local docker registry? Then share the final image to your colleague.

How to make a local registry : https://blog.codecentric.de/en/2014/02/docker-registry-run-private-docker-image-repository/

Upvotes: 1

Docker workflow for scientific computing

Answers (2)

Related Questions