Andrew
Andrew

Reputation: 3185

Celery - minimize memory consumption

We have ~300 celeryd processes running under Ubuntu 10.4 64-bit , in idle every process takes ~19mb RES, ~174mb VIRT, thus - it's around 6GB of RAM in idle for all processes. In active state - process takes up to 100mb of RES and ~300mb VIRT

Every process uses minidom(xml files are < 500kb, simple structure) and urllib.

Quetions is - how can we decrease RAM consuption - at least for idle workers, probably some celery or python options may help? How to determine which part takes most of memory?

UPD: thats flight search agents, one worker for one agency/date. We have 10 agencies, one user search == 9 dates, thus we have 10*9 agents per one user search.

Is it possible start celeryd processes on demand to avoid idle workers(something like MaxSpareServers on apache)?

UPD2: Agent lifecycle is - send HTTP request, wait for response ~10-20 sec, parse xml( takes less then 0.02s), save result to MySQL

Upvotes: 16

Views: 26254

Answers (4)

S.Lott
S.Lott

Reputation: 391820

Read this:

http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency

It sounds like you have one worker per celeryd. That seems wrong. You should have dozens of workers per celeryd. Keep raising the number of workers (and lowering the number of celeryd's) until your system is very busy and very slow.

Upvotes: 8

Brendan Maguire
Brendan Maguire

Reputation: 4541

Use autoscaling. This allows the number of workers under each celeryd instance to be increased or descreased as needed. http://docs.celeryproject.org/en/latest/userguide/workers.html#autoscaling

Upvotes: 1

Tobu
Tobu

Reputation: 25416

The natural number of workers is close to the number of cores you have. The workers are there so that cpu-intensive tasks can use an entire core efficiently. The broker is there so that requests that don't have a worker on hand to process them are kept queued. The number of queues can be high, but that doesn't mean you need a high number of brokers either. A single broker should suffice, or you could shard queues to one broker per machine if it later turns out fast worker-queue interaction is beneficial.

Your problem seems unrelated to that. I'm guessing that your agencies don't provide a message queue api, and you have to keep around lots of requests. If so, you need a few (emphasis on not many) evented processes, for example twisted or node.js based.

Upvotes: 2

asksol
asksol

Reputation: 19499

S. Lott is right. The main instance consumes messages and delegates them to worker pool processes. There is probably no point in running 300 pool processes on a single machine! Try 4 or 5 multiplied by the number of CPU cores. You may gain something by running more than on celeryd with a few processes each, some people have, but you would have to experiment for your application.

See http://celeryq.org/docs/userguide/workers.html#concurrency

For the upcoming 2.2 release we're working on Eventlet pool support, this may be a good alternative for IO-bound tasks, that will enable you to run 1000+ threads with minimal memory overhead, but it's still experimental and bugs are being fixed for the final release.

See http://groups.google.com/group/celery-users/browse_thread/thread/94fbeccd790e6c04

The upcoming 2.2 release also have support for autoscale, which adds/removes process on demand. See the Changelog: http://ask.github.com/celery/changelog.html#version-2-2-0 (this changelog is not completly written yet)

Upvotes: 3

Related Questions