Kladnik
Kladnik

Reputation: 141

Google App Engine restarting automatically randomly

We are running node server on GAE and for some reason a few times a day our server is offline (sometimes it can take a few mins to come back online).

Requests are the same throughout the day and there is also no exception that would be the cause of restart. There is no spike in requests or any special requests that could cause it.

Log when it happens:

2020-04-18T23:48:51.881806Z [GET /v1/util/example [36m304 [35.262 ms - -[ A 
2020-04-18T23:50:17.119906Z [start] 2020/04/18 23:50:17.119185 Quitting on terminated signal A 
2020-04-18T23:50:17.175632Z [start] 2020/04/18 23:50:17.175267 Start program failed: user application failed with exit code -1 (refer to stdout/stderr logs for more detail): signal: terminated 
2020-04-18T23:51:38.772388Z GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 /v1/util/example GET 304 173 B 3.3 s Example-V2/3.1.13 (com.example.app; build:1; iOS 13.4.0) Alamofire/5.1.0 5e9b928a00ff0bc9244f94194c0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:38.786760Z GET 404 324 B 2.4 s Unknown /_ah/start GET 404 324 B 2.4 s Unknown 5e9b928a00ff0c014898f5c27f0001737e737065616b2d76322d32613166310001737065616b2d6170693a323032303034303374303630343431000100
2020-04-18T23:51:39.529080Z [start] 2020/04/18 23:51:39.511828 No entrypoint specified, using default entrypoint: /serve 
2020-04-18T23:51:39.529642Z [start] 2020/04/18 23:51:39.528742 Starting app 
2020-04-18T23:51:39.529968Z [start] 2020/04/18 23:51:39.529100 Executing: /bin/sh -c exec /serve 
2020-04-18T23:51:39.590085Z [start] 2020/04/18 23:51:39.589751 Waiting for network connection open. Subject:"app/invalid" Address:127.0.0.1:8080 
2020-04-18T23:51:39.590571Z [start] 2020/04/18 23:51:39.590347 Waiting for network connection open. Subject:"app/valid" Address:127.0.0.1:8081 
2020-04-18T23:51:39.764383Z [serve] 2020/04/18 23:51:39.763656 Serve started. 
2020-04-18T23:51:39.764935Z [serve] 2020/04/18 23:51:39.764544 Args: {runtimeName:nodejs10 memoryMB:1024 positional:[]} 
2020-04-18T23:51:39.766562Z [serve] 2020/04/18 23:51:39.765904 Running /bin/sh -c exec node server.js 
2020-04-18T23:51:41.072621Z [start] 2020/04/18 23:51:41.071895 Wait successful. Subject:"app/valid" Address:127.0.0.1:8081 Attempts:296 Elapsed:1.481194491s 
2020-04-18T23:51:41.072978Z Express server started on port: 8081 
2020-04-18T23:51:41.073008Z [start] 2020/04/18 23:51:41.072411 Starting nginx 
2020-04-18T23:51:41.085901Z [start] 2020/04/18 23:51:41.085451 Waiting for network connection open. Subject:"nginx" Address:127.0.0.1:8080 
2020-04-18T23:51:41.132064Z [start] 2020/04/18 23:51:41.131572 Wait successful. Subject:"nginx" Address:127.0.0.1:8080 Attempts:9 Elapsed:45.911234ms 
2020-04-18T23:51:41.170786Z [GET /_ah/start [33m404 [11.865 ms - 61[

There is always more than 70% memory free, so that could not be the issue. Only noticed very high CPU utilization when it restarts occurs (10x higher than normally).

In the bottom picture you can clearly see when the restarts happen: CPU utilization

This is my app.yaml

runtime: nodejs10
instance_class: B4
service: example-api

basic_scaling:
  max_instances: 1
  idle_timeout: 30m

handlers:
  - url: .*
    secure: always
    script: auto

This is happening on our production server, so any help would be more than welcome.

Thanks!

Upvotes: 1

Views: 2160

Answers (2)

Oliver Aragon
Oliver Aragon

Reputation: 488

Reading this document, it is mentioned that even though they try to keep basic and manual scaling instances running indefinitely, they are sometimes restarted for maintenance or they might fail due to some other reasons. That is why keeping your max instances as 1 is not considered best practice as it is prone to all of these failures. As mentioned in the other answer, I would also recommend to increase the number of instances so the likelyhood of more failing or being restarted at the same time is lower.

Upvotes: 1

Casper Fabricius
Casper Fabricius

Reputation: 1169

We had the same problem when we migrated our Ruby on Rails app to Google App Engine Standard a year ago. After emailing back and forth with Google Cloud Support, they suggested: "increasing the minimum number of instances will help because you will have more “backup” instances."

At the time we had two instances, and since we upped it the three instances, we have had no downtime related to unexpected server restarts.

We are still not sure why our servers are sometimes deemed unhealthy and restarted by App Engine, but having more instances can help you to avoid downtime in the short run while you investigate the underlying issue.

Upvotes: 1

Related Questions