Lennart Paar
Lennart Paar

Reputation: 71

Long running cloud task on gae flexible terminates early without error. How to debug? What am I missing?

I am running an application on gae flexible with python and flask. I periodically dispatch cloud tasks with a cron job. These basically loop through all users and perform some cluster analysis. The tasks terminate without throwing any kind of error but don't perform all the work (meaning not all users were looped through). It doesn't seem to happen at a consistent time 276.5s - 323.3s nor does it ever stop at the same user. Has anybody experienced anything similar?

My guess is that I am breaching some type of resource limit or timeout somewhere. Things i have thought about or tried:

Sorry if I am too vague or am completely missing the point, I am quite confused with this problem. Thank you for any pointers.

Upvotes: 1

Views: 562

Answers (2)

Lennart Paar
Lennart Paar

Reputation: 71

Thank you for all the suggestions, I played around with them and have found out the root cause, although by accident reading firestore documentation. I had no indication that this had anything to do with firestore.

From here: https://googleapis.dev/python/firestore/latest/collection.html I found out that Query.stream() (or Query.get()) has a timeout on the individual documents like so:

Note: The underlying stream of responses will time out after the max_rpc_timeout_millis value set in the GAPIC client configuration for the RunQuery API. Snapshots not consumed from the iterator before that point will be lost.

So what eventually timed out was the query of all users, I came across this by chance, none of the errors I caught pointed me back towards the query. Hope this helps someone in the future!

Upvotes: 1

Michael T
Michael T

Reputation: 140

Other than use Cloud Scheduler, you can inspect the logs to make sure the Tasks ran properly and make sure there's no deadline issues. As application logs are grouped, and after the task itself is executed, it’s sent to Stackdriver. When a task is forcibly terminated, no log may be output. Try catching the Deadline exception so that some log is output and you may see some helpful info to start troubleshooting.

Upvotes: 0

Related Questions