andy
andy

Reputation: 195

How do I solve 'Request was aborted after waiting too long to attempt to service your request error' generated from App Engine

In general we have around 2 requests / second. However, after we pushed notification to 3000 users, we suddenly get to 120 requests / second. Unfortunately around half of those users were getting 5XX server errors, meaning half of the users who came up were getting blank pages. After the hype is gone, no server error ever happened again.

I did some research and it seems like it is because of the start up time, that is was taking too long for the instance to start up and therefore aborted. I checked my instance number, there were as many as 90 instances created, but active instances dropped from 40 to 0 after a second. This problem only occurred when there was a sudden increase of request, but I thought app engine was supposed to be able to handle this type of increase.

My question is how can I fix this problem? Or where should I keep digging to find the root of the problem. Thanks in advance!

Upvotes: 3

Views: 7841

Answers (4)

Sam Spade
Sam Spade

Reputation: 1486

Not necessarily the solution, but worth checking: Make sure your listening at the port specified by the environment variable provided by Google. This solved it for me.

Upvotes: -1

andy
andy

Reputation: 195

Thank you all for the help, I've figured out the problem.

Credit goes to Dan Cornilescu, his comments gave me the leads to find the root of the problem, which was because I did not have enough min_idle_instances. Once I had enough number of min_idle_instance set in my auto scaling section in my app.yaml I did not receive any 5XX server errors.

Upvotes: 2

Caner
Caner

Reputation: 59148

If you are experiencing high traffic, then maybe it is now good time for you to run load tests. Try to simulate real world traffic as closely as possible and try to find bottlenecks using Stackdriver Trace or by profiling request handling in your code and database operations.

Also check your project scaling settings in your yaml file, especially these parameters:

automaticScaling:
  coolDownPeriod: 120s
  cpuUtilization:
    targetUtilization: 0.5
  maxTotalInstances: 8
  minTotalInstances: 1

Upvotes: 0

Alex
Alex

Reputation: 5276

Which 5XX codes where you seeing?

I experienced an issue with instances mysteriously hanging & dieing on start-up:

app engine instance dies instantly, locking up deferred tasks until they hit 10 minute timeout

It was due to a 3rd party lib I was using which was trying to bind to port during instantiation, and I ended up editting the source code of that lib.

I've also experienced crashes after an instance sent it's ~20th push notification to APNS, due to a memory leak in app engine's version of python's ssl library.

Your issue is a bit different than these but the steps to hunt it down feels the same:

  1. Setup a sandbox by deploying your project to a different project id and reproduce the issue. Making a script that hits this sandbox with thousands of requests over the course of a few minutes from your local machine should do it.
  2. Comment stuff out of your code, deploy again to the sandbox, see if it still crashes, repeat until your script no longer crashes it.

Proceeding with the process of elimination like this should lead you to whats causing the issue by ruling out everything that isnt causing the issue.

You can also do this the opposite direction, by starting from a 'hello world' type project and systematically copy paste chunks of your application code in until the issue starts happening.

Upvotes: 1

Related Questions