Reputation: 9
I have a submit file my_file.sub
for HTCondor with multiple (~100) jobs to be done. File looks pretty simple (I know it can be simplified with $(Process)
but it is generated automatically and cannot be changed):
executable = my_script.sh
arguments = "-q 0 -b file0.txt"
transfer_input_files = file0.txt
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
+MayUseAWS = TRUE
error = errfile0.txt
log = logfile0.txt
request_memory = 1000M
#request_cpus = 1
queue
executable = my_script.sh
arguments = "-q 1 -b file1.txt"
transfer_input_files = file1.txt
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
+MayUseAWS = TRUE
error = errfile1.txt
log = logfile1.txt
request_memory = 1000M
#request_cpus = 1
queue
... (overall 100 executables)
executable = my_script.sh
arguments = "-q 100 -b file100.txt"
transfer_input_files = file100.txt
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
+MayUseAWS = TRUE
error = errfile100.txt
log = logfile100.txt
request_memory = 1000M
#request_cpus = 1
queue
This file is used with HTCondor (condor_submit
) to run jobs on AWS. Each process uses 1 CPU but machine has 4 CPUs.
I want to specify this file such way to run 4 procceses on 1 machine (in parallel), to decrease computational expenses. So instead of running one after another jobs I want to run 4 jobs in parallel, after that anoter 4 jobs in parallel and so on (25 times).
I have read different tutorials and also DAGMan tutorial, but didn't find the solution, how to do this. Are there any ways to solve this problem?
P.S. I'm new to HTCondor, but by coincidence I had to start working with it.
Upvotes: 0
Views: 364
Reputation: 733
In HTCondor terminology, each time condor_submit
sees of of these queue
statements, it creates a job
. A job is the atomic unit of work in HTCondor. A HTCondor job can have one or more processes in it, which might use 1 or more cpu cores.
I assume that your question is that each of these jobs only needs one cpu core (because each job contains one cpu-bound process), and you'd really like to run any four of the 100 jobs concurrently on each machine, and you don't care about the order that the jobs run it?
If so, assume that HTCondor is told that the worker node has four cores, then condor will do this out of the box, with no addition configuration. Is this not happening now? What does the output of condor_status
say about the worker nodes?
Upvotes: 0