Reputation: 1540
I use the python HTCondor api with the simple parallel task:
with schedd.transaction() as shedd_transaction:
sub = htcondor.Submit(
{
"universe": "parallel",
"executable": "/bin/ping",
"machine_count": "1",
"request_cpus": "0",
"error": ".test.err",
"output": ".test.out",
"log": ".test.log",
"should_transfer_files": "NO",
"transfer_executable": "False",
"run_as_owner": "True",
"+Owner": f'"user"',
"+ParallelShutdownPolicy": "WAIT_FOR_ALL",
}
)
res = sub.queue_with_itemdata(
shedd_transaction,
1,
iter(
[
{
"arguments": "-c3 127.0.0.1",
"initial_dir": "/tmp/tmp1",
},
{
"arguments": "-c10 127.0.0.1",
"initial_dir": "/tmp/tmp2",
},
]
),
)
And after watch -n 0.5 condor_q -nobatch -verbose -allusers
command I see:
The job with the 2.1
id ends prematurely! Why is this happening?
The output of condor_q -analyze
during task execution:
root@b0d6b2e00bc8:/# condor_q -analyze 2
007.000: Job is running.
Last successful match: Mon Jul 29 18:47:50 2019
007.000: Run analysis summary ignoring user priority. Of 3 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
2 match and are already running your jobs
0 match but are serving other users
1 are able to run your job
007.001: Job is running.
007.001: Run analysis summary ignoring user priority. Of 3 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
2 match and are already running your jobs
0 match but are serving other users
1 are able to run your job
Upvotes: 0
Views: 377
Reputation: 610
This was actually answered on the htcondor-user mailing list.
"+ParallelShutdownPolicy": f'"WAIT_FOR_ALL"',
should do trick by making the argument value a quoted string.
Upvotes: 1