don_vanchos
don_vanchos

Reputation: 1540

HTCondor: How to wait until all jobs are completed in the parallel universe? ParallelShutdownPolicy does not work

I use the python HTCondor api with the simple parallel task:

with schedd.transaction() as shedd_transaction:
        sub = htcondor.Submit(
            {
                "universe": "parallel",
                "executable": "/bin/ping",
                "machine_count": "1",
                "request_cpus": "0",
                "error": ".test.err",
                "output": ".test.out",
                "log": ".test.log",
                "should_transfer_files": "NO",
                "transfer_executable": "False",
                "run_as_owner": "True",
                "+Owner": f'"user"',
                "+ParallelShutdownPolicy": "WAIT_FOR_ALL",
            }
        )
        res = sub.queue_with_itemdata(
            shedd_transaction,
            1,
            iter(
                [
                    {
                        "arguments": "-c3 127.0.0.1",
                        "initial_dir": "/tmp/tmp1",
                    },
                    {
                        "arguments": "-c10 127.0.0.1",
                        "initial_dir": "/tmp/tmp2",
                    },
                ]
            ),
        )

And after watch -n 0.5 condor_q -nobatch -verbose -allusers command I see: enter image description here

The job with the 2.1 id ends prematurely! Why is this happening?

The output of condor_q -analyze during task execution:

root@b0d6b2e00bc8:/# condor_q -analyze 2

007.000:  Job is running.

Last successful match: Mon Jul 29 18:47:50 2019


007.000:  Run analysis summary ignoring user priority.  Of 3 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      2 match and are already running your jobs
      0 match but are serving other users
      1 are able to run your job


007.001:  Job is running.


007.001:  Run analysis summary ignoring user priority.  Of 3 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      2 match and are already running your jobs
      0 match but are serving other users
      1 are able to run your job

Upvotes: 0

Views: 377

Answers (1)

till
till

Reputation: 610

This was actually answered on the htcondor-user mailing list.

"+ParallelShutdownPolicy": f'"WAIT_FOR_ALL"',

should do trick by making the argument value a quoted string.

Upvotes: 1

Related Questions