Alexander Gannouni
Alexander Gannouni

Reputation: 61

App Engine Flex deployment health check fails

I've made a Python 3 Flask app to serve as an API proxy with gunicorn. I've deployed the openapi to Cloud Endpoints and filled in the endpoints service in the app.yaml file.

When I try to deploy to app engine flex, the health check fails because it took too long. I've tried to alter the readiness_check's app_start_timeout_sec like suggested but to no avail. When checking the logs on stackdriver I can only see gunicorn booting a couple of workers and eventually terminating everything a couple times in a row. No further explanation of what goes wrong. I've also tried to specify resources in the app.yaml and scaling the workers in the gunicorn.conf.py file but to no avail.

Then I tried switching to uwsgi but this acted in the same way: starting up and terminating a couple of times in a row and health check timeout.

error:

ERROR: (gcloud.app.deploy) Error Response: [4] Your deployment has failed to become healthy in the allotted time and therefore was rolled back. If you believe this was an error, try adjusting the 'app_start_timeout_sec' setting in the 'readiness_check' section.

app.yaml

runtime: python
env: flex
entrypoint: gunicorn -c gunicorn.conf.py -b :$PORT main:app

runtime_config:
    python_version: 3

endpoints_api_service:
  name: 2019-09-27r0
  rollout_strategy: managed

resources:
  cpu: 1
  memory_gb: 2
  disk_size_gb: 10

gunicorn.conf.py:

import multiprocessing

bind = "127.0.0.1:8000"
workers = multiprocessing.cpu_count() * 2 + 1

requirments.txt:

aniso8601==8.0.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
Flask==1.1.1
Flask-Jsonpify==1.5.0
Flask-RESTful==0.3.7
gunicorn==19.9.0
idna==2.8
itsdangerous==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
pytz==2019.2
requests==2.22.0
six==1.12.0
urllib3==1.25.5
Werkzeug==0.16.0
pyyaml==5.1.2

Is there anyone who can spot a conflict or something I forgot in here? I'm out of ideas and really need help. It would also definitely help if someone could point me in the right direction where to find more info in the logs (I also run the gcloud app deploy with --verbosity=debug but this only shows "Updating service [default]... ...Waiting to retry."). I would really like to know what causes the health checks to timeout!

Thanks in advance!

Upvotes: 0

Views: 3124

Answers (1)

Joss Baron
Joss Baron

Reputation: 1524

You can both disable Health Checks or customize them:

For disabling you have to add the following to your app.yaml:

health_check: enable_health_check: False

For customize them you can take a look into the Split health checks.

You can customize Liveness checks request by adding an optional liveness_check section on you app.yaml file, for example:

liveness_check: path: "/liveness_check" check_interval_sec: 30 timeout_sec: 4 failure_threshold: 2 success_threshold: 2

In the documentation you can check the settings available for liveness checks.

In addition, there are the Readiness checks. In the same way, you can customize some settings, for example:

readiness_check: path: "/readiness_check" check_interval_sec: 5 timeout_sec: 4 failure_threshold: 2 success_threshold: 2 app_start_timeout_sec: 300

The values above mentioned can be changed according to your needs. Check this values especially since App Engine Flexible takes some minutes to get the instance startup-ed, this is a remarkable difference to App Engine Standard and should not be taken lightly.

If you examine the nginx.health_check logs for your application, you might see health check polling happening more frequently than you have configured, due to the redundant health checkers that are also following your settings. These redundant health checkers are created automatically and you cannot configure them.

Upvotes: 1

Related Questions