user2277146
user2277146

Reputation: 41

How to use data in a docker container?

After installing Docker and googling for hours now, I can't figure out how to place data in a Docker, it seems to become more complex by the minute.

What I did; installed Docker and ran the image that I want to use (kaggle/python). I also read several tutorials about managing and sharing data in Docker containers, but no success so far...

What I want: for now, I simply want to be able to download GitHub repositories+other data to a Docker container. Where and how do I need to store these files? I prefer using GUI or even my GitHub GUI, but simple commands would also be fine I suppose.. Is it also possible to place data or access data from a Docker that is currently not active?

Upvotes: 3

Views: 2592

Answers (3)

Shabaz Patel
Shabaz Patel

Reputation: 291

After pulling the image, you can use code like this in the shell:

docker run --rm -it -p 8888:8888  -v d:/Kaggles:/d  kaggle/python

Run jupyter notebook inside the container

jupyter notebook --ip=0.0.0.0 --no-browser

This mounts the local directory onto the container having access to it.

Then, go to the browser and hit https://localhost:8888, and when I open a new kernel it's with Python 3.5/ I don't recall doing anything special when pulling the image or setting up Docker.

You can find more information from here.

You can also try using datmo in order to easily setup environment and track machine learning projects to make experiments reproducible. You can run datmo task command as follows for setting up jupyter notebook,

datmo task run 'jupyter notebook' --port 8888

It sets up your project and files inside the environment to keep track of your progress.

Upvotes: 1

user2277146
user2277146

Reputation: 41

For others who prefer using GUI, I ended up using portainer. After installing portainer (which is done by using one simple command), you can open the UI by browsing to where it is running, in my case:

http://127.0.1.1:9000

There you can create a container. First specify a name and an image, then scroll down to 'Advanced container options' > Volumes > map additional volume. Click the 'Bind' button, specify a path in the container (e.g. '/home') and the path on your host, and you're done!

Add files to this host directory while your container is not running, then start the container and you will see your files in there. The other way around, accessing in files created by the container, is also possible while the container is not running.

Note: I'm not sure whether this is the correct way of doing things. I will, however, edit this post as soon as I encounter any problems.

Upvotes: 1

Grimmy
Grimmy

Reputation: 4137

Note that I also assume you are using linux containers. This works in all platforms, but on windows you need to tell your docker process that that you are dealing with linux containers. (It's a dropdown in the tray)

It takes a bit of work to understand docker and the only way to understand it is to get your hands dirty. I recommend starting with making an image of an existing project. Make a Dockerfile and play with docker build . etc.

To cover the docker basics (fast version) first.

  • In order to run something in docker we first need to build and image
  • An image is a collection of files
  • You can add files to an image by making a Dockerfile
  • Using the FROM keyword on the first line you extend and image by adding new files to it creating a new image
  • When staring a container we need to tell what image it should use and all the files in the image is copied into the containers storage

The simplest way to get files inside a container:

  • Crate your own image using a Dockerfile and copy in the files
  • Map a directory on your computer/server into the container
  • You can also use docker cp, to copy files from and two a container, but that's not very practical in the long run.

(docker-compose automates a lot of these things for you, but you should probably also play around with the docker command to understand how things work. A compose file is basically a format that stores arguments to the docker command so you don't have to write commands that are multiple lines long)

A "simple" way to configure multiple projects in docker in local development.

In your project directory, add a docker-dev folder (or whatever you want to call it) that contains an environment file and a compose file. The compose file is responsible for telling docker how it should run your projects. You can of course make a compose file for each project, but this way you can run them easily together.

projects/
    docker-dev/
        .env
        docker-compose.yml
    project_a/
        Dockerfile
        # .. all your project files
    project_b/
        Dockerfile
        # .. all your project files

The values in .env is sent as variables to the compose file. We simply add the full path to the project directory for now.

PROJECT_ROOT=/path/to/your/project/dir

The compose file will describe each of your project as a "service". We are using compose version 2 here.

version: '2'
services:
  project_a:
    # Assuming this is a Django project and we override command
    build: ${PROJECT_ROOT}/project_a
    command: python manage.py runserver 0.0.0.0:8000
    volumes:
      # Map the local source inside the container
      - ${PROJECT_ROOT}/project_a:/srv/project_a/
    ports:
      # Map port 8000 in the container to your computer at port 8000
      - "8000:8000"
  project_a:
    # Assuming this is a Django project and we override command
    build: ${PROJECT_ROOT}/project_b
    volumes:
      # Map the local source inside the container
      - ${PROJECT_ROOT}/project_b:/srv/project_b/

This will tell docker how to build and run the two projects. We are also mapping the source on your computer into the container so you can work on the project locally and see instant updates in the container.

Now we need to create a Dockerfile for each out our projects, or docker will not know how to build the image for the project.

Example of a Dockerfile:

FROM python:3.6

COPY requirements.txt /requirements.txt

RUN pip install requirements.txt
# Copy the project into the image
# We don't need that now because we are mapping it from the host
# COPY . /srv/project_a

# If we need to expose a network port, make sure we specify that
EXPOSE 8000

# Set the current working directory
WORKDIR /srv/project_a

# Assuming we run django here
CMD python manage.py runserver 0.0.0.0:8000

Now we enter the docker-dev directory and try things out. Try to build a single project at a time.

docker-compose build project_a
docker-compose build project_b

To start the project in background mode.

docker-compose up -d project_a

Jumping inside a running container

docker-compose exec project_a bash

Just run the container in the forground:

docker-compose run project_a

There is a lot of ground to cover, but hopefully this can be useful.

In my case I run a ton of web servers of different kinds. This gets really frustrating if you don't set up a proxy in docker so you can reach each container using a virtual host. You can for example use jwilder-nginx (https://hub.docker.com/r/jwilder/nginx-proxy/) to solve this in a super-easy way. You can edit your own host file and make fake name entires for each container (just add a .dev suffix so you don't override real dns names)

The jwilder-nginx container will automagically send you to a specific container based on a virtualhost name you decide. Then you no longer need to map ports to your local computer except for the nginx container that maps to port 80.

Upvotes: 1

Related Questions