Reputation: 3611
Let me explain...
I have 2 SQS queues that receive requests for execution of light and heavy report-generating jobs. (The separation into two queues has been introduced for the light jobs not to be influenced by the heavy ones.)
The SQS sends the jobs in an auto-scaling group that contains 3 workers.
The workers are on-demand EC2 instances. I would like to change the launch configuration and use spot instances.
The thing is that some report-generating heavy duty jobs may run for up to 4 hours. So if this kind of job runs on a spot instance worker that may be terminated , additional delays and/or complications will arise.
I would like to use spot instances as workers but also to have the assurance that the worker will not be terminated if there is a job running on it.
The approaches I came up with are the following:
1. Bid for the spot instances with the on-demand price of the instance [it still does not protect from termination but minimises the possibility]
2. Use spot instances with specific period [eg 6 hours] , but still I am confined to 6 hours and the instance terminates. Plus, I dont know if I can set this kind of setting from the launch configuration
Upvotes: 0
Views: 1248
Reputation: 34297
I would like to use spot instances as workers but also to have the assurance that the worker will not be terminated if there is a job running on it.
You seem to understand that this is not the way that spot instances work
They are yours until the price is out bid
The 6 hour thing ("defined duration") might help in some cases I suppose
Two ideas spring to mind
try and estimate the length of the job in the "long" queue before it starts. Then pick the cheapest option to run it
implement a transactional system for your jobs. For example when a job is pulled off the SQS add the time/instanceid/job id to another persisting system, ie a database table. Then have something poll the table every few minutes and check that the instanceid is still there. When the job finally successfully completes get the job runner to remove it from the database table. If the polling notices that the instance has gone away then resubmit the job to the SQS
Upvotes: 1