Vadimcg
Vadimcg

Reputation: 147

Celery lose worker

I use celery 4.4.0 version in my project(Ubuntu 18.04.2 LTS). When i raise Exception('too few functions in features to classify') , celery project lost worker and i get such logs:

[2020-02-11 15:42:07,364] [ERROR] [Main ] Task handler raised error: WorkerLostError('Worker exited prematurely: exitcode 0.')

Traceback (most recent call last):

File "/var/lib/virtualenvs/simus_classifier_new/lib/python3.7/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost human_status(exitcode)), billiard.exceptions.WorkerLostError: Worker exited prematurely: exitcode 0.

[2020-02-11 15:42:07,474] [DEBUG] [ForkPoolWorker-61] Closed channel #1

Do you have any idea how to solve this problem?

Upvotes: 6

Views: 7369

Answers (2)

david.barkhuizen
david.barkhuizen

Reputation: 5665

To follow up on @dejanlekic's answer: not only are WorkerLostError exceptions almost like out-of-memory (OOM) errors, WorkerLostError exceptions often RESULT FROM an underlying out-of-memory exception.

Upvotes: 0

DejanLekic
DejanLekic

Reputation: 19797

WorkerLostError are almost like OutOfMemory errors - they can't be solved. They will continue to happen from time to time. What you should do is to make your task(s) idempotent and let Celery retry tasks that failed due to worker crash.

It sounds trivial, but in many cases it is not. Not all tasks can be idempotent for an example. Celery still has bugs in the way it handles WorkerLostError. Therefore you need to monitor your Celery cluster closely and react to these events, and try to minimize them. In other words, find why the worker crashed - Was it killed by the system because it was consuming all the memory? Was it killed simply because it was running on an AWS spot instance, and it got terminated? Was it killed by someone executing kill -9 <worker pid>? All these circumstances could be handled this way or another...

Upvotes: 4

Related Questions