masa_ekohe
masa_ekohe

Reputation: 79

How can I automatically reboot a GCE instance after it crashes?

I think I want a shell script to automatically reboot an instance no matter if it crashes because of software problems or hardware problems.

The OS is Ubuntu 18.04.

Upvotes: 0

Views: 372

Answers (2)

mebius99
mebius99

Reputation: 2605

GCE provides Managed Instance Groups with Autohealing feature that could be useful for both stateless and stateful workloads.

In case of a stateful workload, if you expected your stateful workload could crash, you should implement preventive measures to protect data consistency like regular committing, keeping transaction logs on a fast reliable storage optimized for writing with the write-back cache disabled, snapshots, initdbscript, etc, pretty similar to what you used to do on bare-metal systems.

Next, you will need distinguishing healthchecks (As comprehensively advised by Kolban). You should have separate health checks for load balancing and for autohealing.

Finally, create a MIG with healthchecks and Autohealing accordingly to your needs.

Please see

Instance Groups: Autohealing

Setting up health checking and autohealing

Upvotes: 0

Kolban
Kolban

Reputation: 15266

There are likely going to be a number of solutions. One that you might consider is to use Google Stackdriver Monitoring Up-time checks. This allows you to define a measurement of a service/Compute Engine being "up" ... see:

https://cloud.google.com/monitoring/uptime-checks

If the compute engine does not respond (because it has crashed or is otherwise not available), this can cause an alert which can trigger a notification channel which can call a webhook that can use Compute Engine management API to stop or restart a Compute Engine.

Break your puzzle apart into separate pieces:

  1. How do I detect that a Compute Engine is un-responsive or crashed?
  2. How do I invoke some software / service / task / function that will perform custom logic?
  3. How do I perform logic that will stop / restart a compute engine?

If you piece these parts together, you should have your solution.

Upvotes: 1

Related Questions