Reputation: 1067
I am having a pipeline on ADO (Azure DevOps). Its quite simple and their are several steps 1,2,3,4 etc.
In one of the steps some code is pushed to SageMaker (AWS cloud solution) and is running there for some hours. I want the ADO pipeline to wait for that to be finished before moving on to the next step in the pipeline.
Its basically a python script like python deploy_to_sagemaker.py algorithm
.
However its abandoning the job after around 40 minutes, probably because of CPU inactivity. Is there any way that I in my .yml file or something like that can tell the pipeline to wait for some hours no matter how little activity there is?
The error message is something like "We stopped hearing from agent id-xxx. Verify the agent machine is running and has a healthy network connection".
Upvotes: 1
Views: 4148
Reputation: 41655
You need to increase the job timeout. from the docs:
Timeouts To avoid taking up resources when your job is unresponsive or waiting too long, it's a good idea to set a limit on how long your job is allowed to run. Use the job timeout setting to specify the limit in minutes for running the job. Setting the value to zero means that the job can run:
The timeout period begins when the job starts running. It does not include the time the job is queued or is waiting for an agent.
The timeoutInMinutes
allows a limit to be set for the job execution time. When not specified, the default is 60 minutes. When 0 is specified, the maximum limit is used (described above).
The cancelTimeoutInMinutes
allows a limit to be set for the job cancel time when the deployment task is set to keep running if a previous task has failed. When not specified, the default is 5 minutes. The value should be in range from 1 to 35790 minutes.
jobs:
- job: Test
timeoutInMinutes: 10 # how long to run the job before automatically cancelling
cancelTimeoutInMinutes: 2 # how much time to give 'run always even if cancelled tasks' before stopping them
You can also set the timeout for each task individually - see task control options.
Upvotes: 1