StackExchanger
StackExchanger

Reputation: 101

Resubmit failed condor jobs

When submitting condor jobs, typically a few or more jobs can fail for unknown reasons, and these jobs have to be resubmitted. so I was wondering: What's the most efficient way of resubmitting failed condor jobs? i.e. with having to fish one by one and resubmit them

I tried to grep all the failed messages and extract the job id, but it's time consuming to manipulate

Upvotes: 0

Views: 598

Answers (1)

Greg
Greg

Reputation: 733

How is the job failing? If it fails with a non-zero exit code, try setting

num_retries = 5

in your condor_submit file. That way, if the job exits with a non-zero exit code, condor will re-run it up to five times until it does exit zero.

Upvotes: 0

Related Questions