Mohan
Mohan

Reputation: 4829

Handle AWS Spot Instance Termination

I am using Spot Instances to run some batch jobs.

However lately we have been seeing a lot of spot instance terminations and want to use the 2-minute interruption notice that aws sends before an instance is terminated.

Sources:

My approach here was to run a separate thread in my application that polls the instance meta-data url http://169.254.169.254/latest/meta-data/spot/instance-action to check if a termination notice has been sent out and raise an exception (or re-trigger the current job)

My code

interruption_monitor.py

import requests
import log
from time import sleep
from threading import Thread


class InstanceTerminated(Exception):
    """Instance Terminated Exception class"""


class InterruptionMonitor(Thread):
    """Threaded Interruption monitor"""

    def __init__(self, sleep_time=0.1, report_time=5):
        super().__init__(daemon=True)
        self.sleep_time = min(sleep_time, report_time)
        self.count = 0
        self.logger = log.get_logger(f"my_app.{__name__}")

    def check_interruption_notice(self):
        """Check for interruption Notice"""
        self.logger.info("CHECKING FOR INTERRUPTION NOTICE...")
        url = 'http://169.254.169.254/latest/meta-data/spot/instance-action'
        response = requests.get(url=url, timeout=5)
        self.logger.info("RESPONSE:", resp=response)
        if response.status_code != 404:
            # print(response)
            if response.action == 'stop' or response.action == 'terminate':
                 raise InstanceTerminated   # Or retrigger the Job


    def run(self):
        """Entry point for thread execution"""
        while True:
            self.check_interruption_notice()
            sleep(self.sleep_time)
            self.count += 1


There are 2 questions that i am looking an answer for:

  1. Is the correct way of handling this? Is there any added cost or if this would effect my existing job performance in any way? If yes what? If No, please suggest a better approach to this?

  2. I am not able to test the positive scenario as I have to wait for AWS to interrupt my spot instances to see if it works as I expected. Is there a way to manually cause the spot instance terminations so that I receive the interruption notice and verify that this works.

PS: I am a noob with AWS, so please bear with me

Upvotes: 0

Views: 717

Answers (1)

Mark B
Mark B

Reputation: 200456

Would there be any added cost or if this would effect my existing Job Performance in any way?

There is no added cost for running another thread in your current ECS processes. Why would there be? Please take the time to understand how ECS bills you if you are concerned about that. ECS doesn't bill per thread.

There could definitely be a performance hit if you poll too often. Your default setting of 0.1 seconds polling is way too fast. I don't understand what you are doing with sleep_time and report_time values, but AWS recommends in the documentation you linked to poll every 5 seconds, not every 0.1 seconds.

Is there a way to manually cause the spot instance terminations so that I receive the interruption notice and verify that this works.

Unfortunately, there is no way to manually trigger that on ECS that I am aware of.

Upvotes: 1

Related Questions