Reputation: 2499
Sometimes I'm experiencing that an AWS instance is using 100% of the CPU resources - because of a conflict between the hardware and software virtualization. More exact it is the process called "System interrupts" which consumes all CPU resources. This is simply solved by stopping the instance, wait a while and then start the instance again (hereby it will startup on other hardware - just a restart would not work).
My question is now: What is the easiest way to stop the AWS instance and after 1 min start the instance up again automatically when the system is none-responsive? Can this somehow be done within AWS ecosystem itself?
The not so easy way would be to get another system to ping the server and if it doesn't respond - then I could execute a custom script with the needed actions. But I'm hoping an easier solution exists?
Upvotes: 1
Views: 603
Reputation: 269171
You can create an Amazon CloudWatch Alarm in the Amazon EC2 management console:
Take the action: Reboot this instance
This will attempt a graceful restart of the operating system, but will force the restart if necessary.
You can configure the alarm to trigger after a given period of 100% CPU. Just be careful that it doesn't trigger when the instance is simply doing "real" work. You might need to play around with the alarm settings to get it just right.
Upvotes: 0
Reputation: 35146
Create a CloudWatch Alarm for when the instance reaches a certain percentage in CPU.
Add a trigger for during AlertState to trigger a Lambda. The Lambda would call the AWS CLI to run the StopInstance method, then sleep for 1 minute, then run the StartInstance method.
Example function: https://www.howtoforge.com/aws-lambda-function-to-start-and-stop-ec2-instance/
Upvotes: 2