bballboy8
bballboy8

Reputation: 450

ec2 Instance Status Check Failed

I am currently running a process on an ec2 server that needs to run consistently in the background. I tried to login to the server and I continue to get a Network Error: Connection timed out prompt. When I check the instance, I get the following message:

Instance reachability check failed at February 22, 2020 at 11:15:00 PM UTC-5 (1 days, 13 hours and 34 minutes ago)

To troubleshoot, I have tried rebooting the server but that did not correct the problem. How do I correct this and also prevent it from happening again?

Upvotes: 6

Views: 16536

Answers (2)

Tiago Peres
Tiago Peres

Reputation: 15451

I've gone through the same problem

enter image description here

and then once looking at the EC2 dashboard could see that something wasn't right with it

enter image description here

but for me rebooting

enter image description here

and waiting for a 2-3 minutes solved it and then was able to SSH to the instance just fine

enter image description here

If that becomes a recurrent problem, then I'll follow through with Jeremy Thompson's advice

... put the EC2's in an Auto Scaling Group. The ALB does a health check and it fails will no longer route traffic to that EC2, then the ASG will send a Status check and take the unresponding server out of rotation.

Upvotes: 4

Asfar Irshad
Asfar Irshad

Reputation: 743

An instance status check failure indicates a problem with the instance, such as:

  • Failure to boot the operating system
  • Failure to mount volumes correctly
  • File system issues
  • Incompatible drivers
  • Kernel panic
  • Severe memory pressures

You can check following for troubleshooting https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesStopping.html

For future reprting and auto recovery you can create a CloudWatch Alarm

For second part

Nothing you can do to stop its occurrence, but for up-time and availability YES you can create another EC2 and add ALB on the top of both instances which checks the health of instance, so that your users/customers/service might be available during recovery time (from second instance). You can increase number of instances as more as you want for high availability (obviously it involves cost)

Upvotes: 7

Related Questions