Reputation: 101
When submitting condor jobs, typically a few or more jobs can fail for unknown reasons, and these jobs have to be resubmitted. so I was wondering: What's the most efficient way of resubmitting failed condor jobs? i.e. with having to fish one by one and resubmit them
I tried to grep all the failed messages and extract the job id, but it's time consuming to manipulate
Upvotes: 0
Views: 598
Reputation: 733
How is the job failing? If it fails with a non-zero exit code, try setting
num_retries = 5
in your condor_submit
file. That way, if the job exits with a non-zero exit code, condor will re-run it up to five times until it does exit zero.
Upvotes: 0