famzah
famzah

Reputation: 1706

GCE Instance Group stuck for hours in ZONE_RESOURCE_POOL_EXHAUSTED

I have an Instance Group configured to deploy in all zones: "europe-west3 (3/3 zones)". Instance redistribution is "On". Autoscaling and autohealing are "Off" (development environment). The Number of instances is 1. Google Cloud Status Dashboard shows that everything is working OK.

Today I tried a rolling REPLACE. Once in the morning and once in the evening. Both attempts failed to create a new VM saying:

The zone 'projects/xxx/zones/europe-west3-c' does not have enough resources available to fulfill the request. Try a different zone, or try again later.

Why wouldn't GCE Instance Group choose a suitable zone automatically where there are enough resources? It retries for hours in the same zone. I can see the log in the "Errors" tab.

Is this a bug in my Instance Group configuration, or is it a bug in GCE? Do you think Autoscaling would behave in the same ridiculous way and is therefore unreliable, too?

Upvotes: 1

Views: 1941

Answers (1)

Sohail Alvi
Sohail Alvi

Reputation: 373

If you cannot create an instance because of (ZONE_RESOURCE_POOL_EXHAUSTED or ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS ), it means that the zone cannot currently accommodate your request. This error is due to the un-availability of Compute Engine resources in the zone, and is not due to your Compute Engine quota 1.

Here are some tips to help mitigate:

  1. Because this situation is temporary and can change frequently based on fluctuating demand, try your request again later.
  2. If possible, try to create the resources in another zone in the region or in another region.
  3. If possible, change the shape of the VM you are requesting. It's easier to get smaller machine types than larger ones. A change to your request, such as reducing the number of GPUs or using a custom VM with less memory or vCPUs, might allow your request to proceed.
  4. Use Compute Engine reservations to reserve resources within a zone to ensure that the resources you need are available when you need them.
  5. If you are trying to create a preemptible instance, remember that preemptible VMs are spare capacity and so might not be obtainable at peak demand periods.
  6. If you receive a not Found or does not exist in zone error when requesting new resources, it means that the zone does not offer the resource or machine type that you have requested. See Regions and zones to find out which features are available in each zone.

It is recommended to deploy and balance your workload across multiple zones and regions to reduce the likelihood of an outage and have access to multiple resource pools when you need to expand quickly. Please review documentation 2 which outlines how to build resilient and scalable architectures on Google Cloud Platform. Please note that you are currently using Google Cloud on-demand without a guarantee of capacity. We now offer a feature called reservations that guarantees Google Cloud capacity, see documentation 3 for details on how to use this feature.

Upvotes: 1

Related Questions