Reputation: 121

Alternatives to copying the folder created with venv for many machines to use the environment

I have around 30 machines and I plan to use them all to run a distributed training python script.They all have the same username, in case this helps. I am on Windows and need to use the GPU in case this helps too. They all have the same python version and software installations necessary but I would need them to have the same version modules installed (pandas, matplotlib, etc).

My approach: I initially used one machine to run python -m venv myenv and then did pip install -r requirements.txt. I put the folder in a network drive and had all machine change directory to that network drive. Since they all have the same username I though it would be ok. It worked on a couple of them but not all of them. The alternative solution would be to have all machines run the command python -m venv myenv pip install -r requirements.txt but wouldn't this not be ideal? What if I have to add a module? Does anyone have any suggestions?

EDIT: I am hoping for an alternative solution to Docker.

Upvotes: 1

Answers (3)

Czaporka

Reputation: 2407

30 machines is quite a lot of machines to manage by hand. Presumably you have SSH access to them and are thinking of some solution for batch-deploying the application to all of them.

This sounds like a perfect use case for Ansible. If you're not familiar with, I recommend to check it out and investigate how well it works with Windows hosts (personally I have only used it with Linux machines; FAQ for Windows). It has a bit of a learning curve, but as soon as you have prepared a working playbook for deploying the application to 1 machine, you can just extend the list of hosts and deploy the exact same thing to all 30.

Basically I would create a playbook to make the machines each create a local virtual environment with dependencies, but I would manage them from a single node.

Sample playbook:

---

- name: Do stuff
  gather_facts: false
  hosts: all
  vars:
    virtual_environment: "/home/pi/test_env"

  tasks:
    - name: Install low-level dependencies
      become: true
      apt:
        name: libatlas-base-dev
        update_cache: yes

    - name: Install pip dependencies
      pip:
        name:
          - pandas
          - matplotlib
          - numpy
        virtualenv: "{{ virtual_environment }}"
        virtualenv_command: python3 -m venv

    - name: Execute some script
      command:
        cmd: |
          {{ virtual_environment }}/bin/python -c '
          import pandas as pd
          print("Pandas is installed under:", pd.__file__)
          '
      register: script

    - name: Print the output
      debug:
        msg: "{{ script.stdout }}"

Sample inventory:

192.168.0.24

Output:

$ ansible-playbook -i inventory playbook.yml

PLAY [Do stuff] **************************************************************************

TASK [Install low-level dependencies] ****************************************************
ok: [192.168.0.24]

TASK [Install pip dependencies] **********************************************************
ok: [192.168.0.24]

TASK [Execute some script] ***************************************************************
changed: [192.168.0.24]

TASK [Print the output] ******************************************************************
ok: [192.168.0.24] => {
    "msg": "Pandas is installed under: /home/pi/test_env/lib/python3.7/site-packages/pandas/__init__.py"
}

PLAY RECAP *******************************************************************************
192.168.0.24               : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Note how for most tasks this run resulted in "ok" rather than "changed" - this is because I had already run this playbook before and this time Ansible noticed that the packages were already installed so there's no need to do it over again. Most Ansible modules are set up in this way which makes it efficient. Refer to the (pretty good!) docs to figure out what parameters are available for each module (in the above example, modules are: apt, pip, command, debug).

Upvotes: 1

Kevin J. Rice

Reputation: 3373

Although this is slightly outside the realm of pure-python, creating a docker container for the execution, completing a setup with build instructions, would allow complete isolation of that venv from its surrounding machine setup.

Docker allows for multiple instances running on the same box to share the same definitions, so if you have a situation where you're running 5 copies of the same code, it can re-use the same memory IIRC for the shared in-mem code runtimes. I've read this, haven't tried the memory spec though.

Upvotes: 1

e.arbitrio

Reputation: 588

Maybe you could use a conda environment and the export it:

conda-env export -n myenvnaame > myenvfile.yml

and then import on the other machines:

conda-env create -n venv -f=myenvfile.yml

or in alternative you could use docker and share the image on all the machines

Upvotes: 1

Alternatives to copying the folder created with venv for many machines to use the environment

Answers (3)

Related Questions