Reputation: 435
We are running an API server where users submit jobs for calculation, which take between 1 second and 1 hour. They then make requests to check the status and get their results, which could be (much) later, or even never.
Currently jobs are added to a pub/sub queue, and processed by various worker processes. These workers then send pub/sub messages back to a listener, which stores the status/results in a postgres database.
I am looking into using Celery to simplify things and allow for easier scaling.
Submitting jobs and getting results isn't a problem in Celery, using celery_app.send_task
. However, I am not sure how to best ensure the results are stored when, particularly for long-running or possibly abandoned jobs.
Some solutions I considered include:
Give all workers access to the database and let them handle updates. The main limitation to this seems to be the db connection pool limit, as worker processes can scale to 50 replicas in some cases.
Listen to celery events in a separate pod, and write changes based on this to the jobs db. Only 1 connection needed, but as far as I understand, this would miss out on events while this pod is redeploying.
Only check job results when the user asks for them. It seems this could lead to lost results when the user takes too long, or slowly clog the results cache.
As in (3), but periodically check on all jobs not marked completed in the db. A tad complicated, but doable?
Is there a standard pattern for this, or am I trying to do something unusual with Celery? Any advice on how to tackle this is appreciated.
Upvotes: 1
Views: 446
Reputation: 19822
In the past I solved similar problem by modifying tasks to not only return result of the computation, but also store it into a cache server (Redis) right before it returns. I had a task that periodically (every 5min) collects these results and writes data (in bulk, so quite effective) to a relational database. It was quite effective until we started filling the cache with hundreds of thousands of results, so we implemented a tiny service that does this instead of task that runs periodically.
Upvotes: 2