Lawrence Bird
Lawrence Bird

Reputation: 505

Certain Celery Tasks starts but hangs and never executes

I have an issue with Django and Celery where some registered tasks never get executed.

I have three tasks in my tasks.py file, two of them; schedule_notification() and schedule_archive() work without issue. They are executed without issue at the predefined ETA.

With the schedule_monitoring() function, I can see the job is started in Celery Flower but it never actually executes. It just sits there.

I have confirmed I can run the command locally from the worker so I'm not sure where the issue could be.

tasks.py (failing function)

@task
def schedule_monitoring(job_id: str, action: str) -> str:
    salt = OSApi() # This is a wrapper around a REST API.
    job = Job.objects.get(pk=job_id)
    target = ('compound', f"G@hostname:{ job.network.gateway.host_name } and G@serial:{ job.network.gateway.serial_number }")

    policies = [
        'foo',
        'bar',
        'foobar',
        'barfoo'
    ]

    if action == 'start':
        salt.run(target, 'spectrum.add_to_collection', fun_args=['foo'])  
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Started proactive monitoring for job.", job)
    elif action == 'stop':
        salt.run(target, 'spectrum.remove_from_collection', fun_args=['bar'])
        for policy in policies:
            salt.run(target, 'spectrum.refresh_policy', fun_args=[policy])

        create_activity("Informational", "MONITORING", "Stopped proactive monitoring for job.", job)
    else:
        raise NotImplementedError

    return f"Applying monitoring action: {action.upper()} to Job: {job.job_code}"

Celery Flower Output

Celery Configuration

# Async
CELERY_BROKER_URL = os.environ.get('BROKER_URL', 'redis://localhost:6379')
CELERY_RESULT_BACKEND = os.environ.get('RESULT_BACKEND', 'redis://localhost:6379')
CELERY_ACCEPT_CONTENT = ['application/json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_TIMEZONE = 'UTC'
CELERY_ENABLE_UTC = True

Below is the successful execution of the command on the worker that was supposed to execute it:

>>> schedule_monitoring(job.pk, 'start')
'Applying monitoring action: START to Job: Test 1'
>>> schedule_monitoring(job.pk, 'stop')
'Applying monitoring action: STOP to Job: Test 1'
>>> exit()
Waiting up to 5 seconds.
Sent all pending logs.
root@9d045ff7dfc1:/app#

From debugging the worker; all I see is the following when the job starts, but then nothing interesting;

[2021-01-06 17:08:00,001: DEBUG/MainProcess] TaskPool: Apply <function _trace_task_ret at 0x7f6adbc29680> (args:('Operations.tasks.schedule_monitoring', '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', {'lang': 'py', 'task': 'Operations.tasks.schedule_monitoring', 'id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'shadow': None, 'eta': '2021-01-06T17:08:00+00:00', 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None], 'root_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'parent_id': None, 'argsrepr': "(UUID('11118a85-20f2-488d-9a12-b8d200ea7a74'), 'start')", 'kwargsrepr': '{}', 'origin': 'gen442@31a9de56d061', 'reply_to': '24a8dc4c-2e5c-32ce-aa3d-84392d7cbf41', 'correlation_id': '407e8a87-b3bf-4e8f-8a17-776a33ae5fea', 'hostname': 'celery@bc4bb7af894f', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': ['11118a85-20f2-488d-9a12-b8d200ea7a74', 'start'], 'kwargs': {}}, b'[["11118a85-20f2-488d-9a12-b8d200ea7a74", "start"], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]', 'application/json', 'utf-8') kwargs:{})
[2021-01-06 17:08:00,303: DEBUG/MainProcess] basic.qos: prefetch_count->32
[2021-01-06 17:08:00,305: DEBUG/MainProcess] Task accepted: Operations.tasks.schedule_monitoring[407e8a87-b3bf-4e8f-8a17-776a33ae5fea] pid:44
[2021-01-06 17:08:00,311: DEBUG/ForkPoolWorker-3] Resetting dropped connection: storage.googleapis.com
[2021-01-06 17:08:00,383: DEBUG/ForkPoolWorker-3] https://storage.googleapis.com:443 "GET /download/storage/v1/b/foo/o/bar?alt=media HTTP/1.1" 200 96
[2021-01-06 17:08:01,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:06,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:11,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:16,228: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:21,227: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:26,229: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]
[2021-01-06 17:08:31,231: DEBUG/MainProcess] pidbox received method enable_events() [reply_to:None ticket:None]

Upvotes: 2

Views: 3609

Answers (1)

Lawrence Bird
Lawrence Bird

Reputation: 505

The solution I found to this was to create two queues within Celery, one that manages the scheduled tasks via Celery Beat and another with a higher priority for the rest.

After I created individual queues, tasks started flowing and completing correctly; my guess is a congested bus or workers.

To create additional queues, do the following in settings.py:

from kombu import Queue, Exchange

CELERYD_MAX_TASKS_PER_CHILD = 4

CELERY_DEFAULT_QUEUE = 'scheduled'
CELERY_QUEUES = (
    Queue('scheduled', Exchange('scheduled'), routing_key='sched'),
    Queue('proactive_monitoring', Exchange('proactive_monitoring'), routing_key='prmon'),
)

Then when registering your task functions, pass the queue you want them to be assigned to:

tasks.py:

@task(queue='proactive_monitoring')
def schedule_monitoring(job_id: str, action: str) -> str:

Finally, make sure you start at least one worker in each queue. You do this by passing the queue when starting your worker:

celery -A proj worker -l INFO -Q proactive_monitoring

If you are starting multiple workers on localhost, you should differentiate at least the first two in each queue by specifying the name attribute:

celery -A proj worker -l INFO -Q proactive_monitoring -n prmon_first_worker

Upvotes: 2

Related Questions