DSKim
DSKim

Reputation: 595

how to run a python program on Condor?

I am new to Condor and am trying to run my Python program on Condor but have a difficulty of doing it. All tutorials I found assume a single file Python program but my Python program consists of multiple packages and files and also use other libraries such as numpy and scipy. In that case, how can I make Condor run my program? Should I convert the program into some kind of executable? Or, is there any way of transferring Python source codes into a Condor machine and making Python on Condor run the source codes?

Thanks,

Upvotes: 5

Views: 4408

Answers (3)

Charlie Parker
Charlie Parker

Reputation: 5251

tldr; Import the right path of condor to your python submission script at the top

I really do not understand how condor works but it seems that once I put the right path to python at the top for the current environment it started working. So check where is your python command:

(automl-meta-learning) miranda9~/automl-meta-learning $ which python
~/miniconda3/envs/automl-meta-learning/bin/python

then copy paste that to the top of your python submission script:

#!/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python

I wish I could include all of this in the job.sub. If you know how please let me know.


In case my submission script is helpful to you:

####################
#
# Experiments script
# Simple HTCondor submit description file
#
# reference: https://gitlab.engr.illinois.edu/Vision/vision-gpu-servers/-/wikis/HTCondor-user-guide#submit-jobs
#
# chmod a+x test_condor.py
# chmod a+x experiments_meta_model_optimization.py
# chmod a+x meta_learning_experiments_submission.py
# chmod a+x download_miniImagenet.py
#
# condor_submit -i
# condor_submit job.sub
#
####################

# Executable   = meta_learning_experiments_submission.py
# Executable = automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
# Executable = ~/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
Executable = /home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py

## Output Files
Log          = condor_job.$(CLUSTER).log.out
Output       = condor_job.$(CLUSTER).stdout.out
Error        = condor_job.$(CLUSTER).err.out

# Use this to make sure 1 gpu is available. The key words are case insensitive.
REquest_gpus = 1
# requirements = ((CUDADeviceName = "Tesla K40m")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.gpus >= Requestgpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
# requirements = (CUDADeviceName == "Tesla K40m")
# requirements = (CUDADeviceName == "Quadro RTX 6000")
requirements = (CUDADeviceName != "Tesla K40m")

# Note: to use multiple CPUs instead of the default (one CPU), use request_cpus as well
Request_cpus = 8

# E-mail option
Notify_user = [email protected]
Notification = always

Environment = MY_CONDOR_JOB_ID= $(CLUSTER)

# "Queue" means add the setup until this line to the queue (needs to be at the end of script).
Queue

I said I use a python submission script so let me copy the top of it:

#!/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python

import torch
import torch.nn as nn
import torch.optim as optim
# import torch.functional as F
from torch.utils.tensorboard import SummaryWriter 

I do not submit a bash script with arguments, the arguments are inside my python script. I don't know how to use bash so this works better for me.


Reference solution: https://stackoverflow.com/a/64484025/1601580

Upvotes: 1

pangyuteng
pangyuteng

Reputation: 1839

btw. Jobs can be executed in Docker containers now via HTCondor!

https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

An alternative to using Docker (which I won't recommend, but had to do this because several years ago, condor did not support Docker) is to utilize virtual environment. I would create an Anaconda virtual environment by specifying a folder that can be accessed by all condor nodes. Jobs running in condor then need to active the virtual environment for each job by first activating the environment.

Upvotes: 0

jpatton
jpatton

Reputation: 403

Your jobs will need to bring an entire python installation (including SciPy and NumPy) with them. This involves building a python installation in a local directory (possibly in an interactive HTCondor job), installing whatever libraries you need within this local python install, then creating a tarball of the install that you include as transfer_input_files. You'll have to use a wrapper script in your job that un-tars your python install and points your job to the correct python executable before running your python scripts.

Here is one cluster's explaination for how to do this: http://chtc.cs.wisc.edu/python-jobs.shtml

Upvotes: 2

Related Questions