Itsik Belson
Itsik Belson

Reputation: 115

Root cause and retry of "The request was aborted because there was no available instance." error in Cloud Functions

Over time, we see sometimes bursts of errors in our Cloud Functions - "The request was aborted because there was no available instance." with HTTP response 500, which indicates Cloud Functions intrinsically cannot manage the rate of traffic. This happens for Cloud Functions triggered by changes on Firestore, RTDB, PubSub and even scheduled functions. According to the troubleshooting guide, this can happen due to sudden increase of traffic, long cold-starts or long request processing. We also understand that it's a good practice to use exponential backoff retry mechanism where it's important that the Cloud Function will execute. We know it's not a max-instance issue as we didn't set one for these functions, and also the error is 500 and not 429.

Questions:

  1. Can we identify the underling root-cause - e.g. is it a cold-start? is it a long running function which causes it?
  2. When functions fail due to cold-start time? Does this cold-start include only the time it takes to provision the instance and put the code there or also the initial execution of the runtime environment (e.g. node index.js), which executes also the code in the global scope?
  3. Cloud Function have a retry on failure configuration. Does it cover also the "no available instance" case we experienced?

Upvotes: 1

Views: 224

Answers (1)

Sathi Aiswarya
Sathi Aiswarya

Reputation: 2905

This error can be caused by one of the following:

  • A huge sudden increase in traffic.
  • A long cold start time.
  • A long request processing time.
  • Transient factors attributed to the Cloud Run service

As mentioned in this github,Cloud Run does not mark request logs with information about whether they caused a cold start or not.However,Stackdriver which is a suite of monitoring tools (Stackdriver Logging,Stackdriver Error reporting,Stackdriver Monitoring) that helps you understand what is going on in your cloud functions. It has in-built tools for logging,reporting errors and monitoring.Apart from stackdriver, you can do execution times, execution counts and memory usage in the GCP console You can refer this Stackdriver Logging and Stackdriver Trace for Cloud Functions & Error Reporting

cold-start includes the time it takes to provision the instance and also the initial execution of the runtime environment. I think the retry on failure configuration does not cover the "no available instance"

I have found this github & Issue tracker raised for a similar issue which is still open.If you are still facing the issue, you can follow that issue for future updates and also add your concerns there.

Upvotes: 1

Related Questions