Is it possible to identify google compute engine onHost error before hand to perform some pre restart tasks

Question

Is it possible to identify google compute engine onHost error before hand to perform some pre/post restart tasks.

If so what is the procedure to monitor this event?

GalloCedrone · Accepted Answer

No, it is not possible since a host error likely is an expected event. It should be rare, but when it occurs you cannot take any action.

Keep in mind that even if it is in the instance is in the "Cloud" there is a physical machine that is running your workload and in the unlucky case there is an hardware failure or of the virtualisation enviroment there is nothing you can do.

To be more clear there is not the possibility to gather a notice 60 minutes before an "onHost failure" happens as ,for example, you can do when you cannot livemigrate you virtual machine during a maintenance.

Quoting from the documentation

A host error means that there was a hardware or software issue on the physical machine hosting your virtual machine that caused your virtual machine to crash. When Compute Engine detects such an event, we add a compute.instances.hostError entry to your operations log. If your virtual machine is set to automatically restart, which is the default, Google will also restart your virtual machine on a different physical machine.

In general, physical hardware failures and software failures can happen from time-to-time, but are rare occurrences. To protect your applications and services from potentially disruptive system events like these, make sure you design robust systems and build scalable and resilient web applications. Use managed instance groups to perform health checking and scaling across groups of Compute Engine instances.

UPDATE

Compute Engine offers live migration to keep your virtual machine instances running even when a host system event occurs, such as a software or hardware update.

Live migration keeps your instances running during:

Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if a hardware fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.

Is it possible to identify google compute engine onHost error before hand to perform some pre restart tasks

Answers (1)

UPDATE

Related Questions