Reputation: 121
I have around 30 machines and I plan to use them all to run a distributed training python script.They all have the same username, in case this helps. I am on Windows and need to use the GPU in case this helps too. They all have the same python version and software installations necessary but I would need them to have the same version modules installed (pandas, matplotlib, etc).
My approach: I initially used one machine to run python -m venv myenv
and then did pip install -r requirements.txt
. I put the folder in a network drive and had all machine change directory to that network drive. Since they all have the same username I though it would be ok. It worked on a couple of them but not all of them. The alternative solution would be to have all machines run the command python -m venv myenv
pip install -r requirements.txt
but wouldn't this not be ideal? What if I have to add a module? Does anyone have any suggestions?
EDIT: I am hoping for an alternative solution to Docker.
Upvotes: 1
Views: 124
Reputation: 2407
30 machines is quite a lot of machines to manage by hand. Presumably you have SSH access to them and are thinking of some solution for batch-deploying the application to all of them.
This sounds like a perfect use case for Ansible. If you're not familiar with, I recommend to check it out and investigate how well it works with Windows hosts (personally I have only used it with Linux machines; FAQ for Windows). It has a bit of a learning curve, but as soon as you have prepared a working playbook for deploying the application to 1 machine, you can just extend the list of hosts and deploy the exact same thing to all 30.
Basically I would create a playbook to make the machines each create a local virtual environment with dependencies, but I would manage them from a single node.
Sample playbook:
---
- name: Do stuff
gather_facts: false
hosts: all
vars:
virtual_environment: "/home/pi/test_env"
tasks:
- name: Install low-level dependencies
become: true
apt:
name: libatlas-base-dev
update_cache: yes
- name: Install pip dependencies
pip:
name:
- pandas
- matplotlib
- numpy
virtualenv: "{{ virtual_environment }}"
virtualenv_command: python3 -m venv
- name: Execute some script
command:
cmd: |
{{ virtual_environment }}/bin/python -c '
import pandas as pd
print("Pandas is installed under:", pd.__file__)
'
register: script
- name: Print the output
debug:
msg: "{{ script.stdout }}"
Sample inventory:
192.168.0.24
Output:
$ ansible-playbook -i inventory playbook.yml
PLAY [Do stuff] **************************************************************************
TASK [Install low-level dependencies] ****************************************************
ok: [192.168.0.24]
TASK [Install pip dependencies] **********************************************************
ok: [192.168.0.24]
TASK [Execute some script] ***************************************************************
changed: [192.168.0.24]
TASK [Print the output] ******************************************************************
ok: [192.168.0.24] => {
"msg": "Pandas is installed under: /home/pi/test_env/lib/python3.7/site-packages/pandas/__init__.py"
}
PLAY RECAP *******************************************************************************
192.168.0.24 : ok=4 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Note how for most tasks this run resulted in "ok" rather than "changed" - this is because I had already run this playbook before and this time Ansible noticed that the packages were already installed so there's no need to do it over again. Most Ansible modules are set up in this way which makes it efficient. Refer to the (pretty good!) docs to figure out what parameters are available for each module (in the above example, modules are: apt
, pip
, command
, debug
).
Upvotes: 1
Reputation: 3373
Although this is slightly outside the realm of pure-python, creating a docker container for the execution, completing a setup with build instructions, would allow complete isolation of that venv from its surrounding machine setup.
Docker allows for multiple instances running on the same box to share the same definitions, so if you have a situation where you're running 5 copies of the same code, it can re-use the same memory IIRC for the shared in-mem code runtimes. I've read this, haven't tried the memory spec though.
Upvotes: 1
Reputation: 588
Maybe you could use a conda environment and the export it:
conda-env export -n myenvnaame > myenvfile.yml
and then import on the other machines:
conda-env create -n venv -f=myenvfile.yml
or in alternative you could use docker and share the image on all the machines
Upvotes: 1