sandeep007
sandeep007

Reputation: 383

Unable to run background job in flask using Thread in GCP App engine

I have a web application where people can upload CSV file (no concurrent uploads only 1 upload per day) and there will be approximately 1000 rows in CSV file. This row is processed and updated in firestore database based on few conditions and we do not want to run this extracted rows in parallel as there might be problem with concurrency.

Each row processing takes approximately 1 second and hence job takes 15 minutes. This has to be done asynchronously. All our application is in GCP APP Engine and my python code looks as follows

app.py

@app.route('/batch', methods=['POST'])
def read_csv(**kwargs):
     threading.Thread(target=iterate_csv_file, args=(
        df, file_name, file_content)).start()

main.py

if __name__ == '__main__':
    app.run(host='127.0.0.1', port=5000, debug=True)

app.yaml

runtime: python37
entrypoint: gunicorn -t 120 -b :$PORT main:app
service: my-test
instance_class: F4
automatic_scaling:
  min_instances: 1
  max_instances: 1000

handlers:
  - url: /.*
    secure: always
    script: auto

I am getting following error after 7 minutes (processing after approx 400 rows)

Exception in thread Thread-58:
 textPayload: "Traceback (most recent call last):
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_user_mgt.py", line 837, in _make_request
    return self.http_client.body_and_response(method, url, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 125, in body_and_response
    resp = self.request(method, url, **kwargs)
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/firebase_admin/_http_client.py", line 117, in request
    resp.raise_for_status()
  File "/layers/google.python.pip/pip/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://identitytoolkit.googleapis.com/v1/projects/my-project/accounts" 

Now I have seen many tutorials to use celery and RabbitMQ. For my use case is it required to use them or simple background thread should work? Why am i getting error when I use background thread. Is this error related to flask or GCP or some timeout. I was navigating through website(several APIs would have called) when this background thread was running in GCP. I have followed following tutorial and came up with this code https://pastebin.com/vnypfpU7

Upvotes: 0

Views: 626

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75940

You can't create thread in App Engine standard, the runtime is not designed for that. The instance can be offloaded anytime if no request are currently processed. It's your case because there is no longer request in progress, just a background thread out of request handling context.

And even if the min instance is set to 1, the a new one can be created and the old one deleted, the "1" is respected because you always have at least 1 instance up to serve the traffic.

To achieve this, you need to create a Cloud Task that call back your App Engine. This time, the process is perform inside a "request context" created by Cloud Task. It's not a a user request, but still a request that prevent instance offload in the middle of the thread process.

Upvotes: 1

Related Questions