ashleydb
ashleydb

Reputation: 21

AppEngine Flex Node.js timeout error 4 when deploying

Whenever I try to deploy my updated AppEngine node.js website (was node 6.14.x, but I've updated to 12.x as part of my attempts to fix this) I am getting a timeout error. Running the site locally, or on the Google Cloud Shell works fine immediately and I can hit the health check url too.

I usually use the Google Cloud Shell to deploy my prod environment, but have also tried from my local machine and got the same results. In either case, for days I've been getting this error:

googlecloudsdk.api_lib.app.operations_util.OperationError: Error Response: [4] Timed out waiting for the flex deployment to become network provisioned.
ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the flex deployment to become network provisioned.

On one attempt from the Google Cloud Shell I got the following error:

Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

I tried updating the app_start_timeout_sec to 600 and got the same error "Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section."

Then I tried updating it to 1800 and just got the original "ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the app infrastructure to become healthy."

Obviously each deploy just takes longer to fail.

Verbose logging from the deploy doesn't really show anything useful, but see below. I'm not sure if there are other places I should be looking for clues, but from poking around nothing is jumping out as an issue.

I have looked at several tickets on the Google Issue Tracker (e.g. https://issuetracker.google.com/issues/73583699) and tried some of the suggestions, but I don't see anything wrong (nothing in logs about resources, noting in settings about quotas being reached.) I opened a ticket there but they closed it saying

"I tried to reproduce the issue on my end by making a few App Engine deployments but couldn't get to the error message you experienced. Moreover, I haven't found other cases or issues reporting the same behavior, so as of now the issue seems isolated to your environment."

Caveats: This is apparently the first deploy I've attempted to production in 2020, so maybe something changed that I need to do? Looks like 2019-10-12 was the last build.

So, any hints on what I can do to find the error, or how I can resolve it?

Thanks for the help.

app.yaml

(Update, I've also tried commenting out the readiness_check section, but it makes no difference).

# [START runtime]
runtime: nodejs
env: flex
automatic_scaling:
  min_num_instances: 1
  max_num_instances: 5
  cool_down_period_sec: 180
  cpu_utilization:
    target_utilization: 0.8
readiness_check:
  path: "/health"
  check_interval_sec: 5
  timeout_sec: 4
  failure_threshold: 2
  success_threshold: 2
  app_start_timeout_sec: 300
env_variables:
  NODE_ENV: 'production'
  FIREBASE_SERVICE_PROJECT_ID: 'blah'
  FIREBASE_SERVICE_PRIVATE_KEY_ID: 'blah'
  FIREBASE_SERVICE_PRIVATE_KEY: "blah"
  FIREBASE_SERVICE_CLIENT_EMAIL: 'blah'
  FIREBASE_SERVICE_CLIENT_ID: 'blah'
  FIREBASE_SERVICE_CLIENT_X509_CERT_URL: 'blah'
  GOOGLE_CLOUD_STORAGE_BUCKET: 'blah'
  AUTH_SECRET: 'blah'
  AUTH_USERNAME: 'blah'
  AUTH_PASSWORD: 'blah'
  LEAGUE_TABLE_URL: 'blah'
# [END runtime]

Verbose "gcloud app deploy" log

...
Updating service [default] (this may take several minutes)...⠏DEBUG: Operation [apps/daggers-72eb6/operations/b7ef6a92-1ce7-457a-a85a-ef32ae235cf9] not complete. Waiting to retry.       
Updating service [default] (this may take several minutes)...⠏DEBUG: Operation [apps/daggers-72eb6/operations/b7ef6a92-1ce7-457a-a85a-ef32ae235cf9] complete. Result: {                   
    "done": true,
    "error": {
        "code": 4,
        "message": "Timed out waiting for the flex deployment to become network provisioned."
    },
    "metadata": {
        "@type": "type.googleapis.com/google.appengine.v1.OperationMetadataV1",
        "insertTime": "2020-07-14T22:46:01.763Z",
        "method": "google.appengine.v1.Versions.CreateVersion",
        "target": "apps/blah/services/default/versions/20200714t154223",
        "user": "blah"
    },
    "name": "apps/blah/operations/blah"
}
Updating service [default] (this may take several minutes)...failed.                                                                                                                      
DEBUG: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the flex deployment to become network provisioned.
Traceback (most recent call last):
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 983, in Execute
    resources = calliope_command.Run(cli=self, args=args)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 808, in Run
    resources = command_instance.Run(args)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/surface/app/deploy.py", line 117, in Run
    default_strategy=flex_image_build_option_default))
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 651, in RunDeploy
    ignore_file=args.ignore_file)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/command_lib/app/deploy_util.py", line 437, in Deploy
    extra_config_settings)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/appengine_api_client.py", line 208, in DeployService
    poller=done_poller)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 314, in WaitForOperation
    sleep_ms=retry_interval)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
    sleep_ms, _StatusUpdate)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
    sleep_ms=sleep_ms)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
    if not should_retry(result, state):
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
    return not poller.IsDone(operation)
  File "/Users/ashleydb/Applications/google-cloud-sdk/lib/googlecloudsdk/api_lib/app/operations_util.py", line 183, in IsDone
    encoding.MessageToPyValue(operation.error)))
googlecloudsdk.api_lib.app.operations_util.OperationError: Error Response: [4] Timed out waiting for the flex deployment to become network provisioned.
ERROR: (gcloud.app.deploy) Error Response: [4] Timed out waiting for the flex deployment to become network provisioned.

Updates

Everything works fine locally. Here is a list of things I have tried, none of which have resolved the deployment issue:

Upvotes: 1

Views: 1585

Answers (2)

ashleydb
ashleydb

Reputation: 21

Finally resolved. Turns out one of the env vars I had configured in my app.yaml (a private key for accessing firebase) had been changed from being wrapped in single quotes to double quotes ('blah' to "blah") which was enough to break the variable apparently.

The fact that everything worked locally but not when deploying was pointing at the app.yaml for sure, but it was frustrating to not get any other error output about why the instance wasn't starting up.

Upvotes: 1

GAEfan
GAEfan

Reputation: 11360

You have set up your readiness_check to go to the url /health:

readiness_check:
  path: "/health"
  check_interval_sec: 5
  timeout_sec: 4
  failure_threshold: 2
  success_threshold: 2
  app_start_timeout_sec: 300

Make sure you have a URL handler for that, so that if you navigate to yourapp.appspot.com/health, it returns a valid 200 response.

Upvotes: 0

Related Questions