JaskeyLam
JaskeyLam

Reputation: 15755

RabbitMQ-Is it a good practice to create multiple consumers for a single queue in one application process

I just work with an new project backed by RabbitMQ, and there are multiple consumer instances created listening to the same queue when the application starts. Howerver they shares the same connections with different channels.

The messages from the queue are massive(millions messages for one single producing behavior ) so I guess the very first code author is trying to do something to make consuming faster.

I am trying to find some posts discussing on this but I can't find a very certain answer.

What I get so far is:

  1. Each channel will have a separate dispatch thread
  2. The operation commands on the same channel is serialized even though they are called in multiple thread

So

  1. creating multiple consumers thus multiple channels will have multiple dispatch threads, but I don't think it provided a better performance to message dispatching since the dispatch should far from enough with one single thread.

  2. The operation of ack will can be paralized in different channels, I am not quite sure this will give any better performances.

Since more channels consume more system resources I wonder is this practice good?

Upvotes: 3

Views: 4189

Answers (1)

theMayer
theMayer

Reputation: 16167

There seem to be a few things going on here, so let's try to look at this scenario from a holistic perspective.

For starters, it sounds like the original designer of this code understood some basics about RabbitMQ (or learned a few things by trial and error), but may have had trouble putting all the pieces together- hopefully I can help.

  1. RabbitMQ connections are, in reality, AMQP-over-TCP connections (and thus are somewhere around the session layer of the OSI model). TCP connections are supposed to be opened up and used until some sort of network interruption or application shutdown closes them (and for this reason, AMQP has trouble with firewalls and other smart network devices). Using a single TCP connection for message processing activities for a single logical process is a good idea, as creating and destroying TCP connections is usually an expensive process for the computer, which leads to

  2. RabbitMQ channels are used to multiplex communication streams in the AMQP-Over-TCP connection (and are defined in the AMQP Protocol Spec). All they do is specify an integer value (I can't remember the number of bytes, but it doesn't matter anyway) used to preface the subsequent command or response on a TCP connection. Most AMQP operations are channel-specific. For the purposes of higher-level operations, channels are treated similar to connections, as they are application-level constructs.

Now, where I think the question starts to go off the rails a bit is here:

The messages from the queue are massive(millions messages for one single producing behavior ) so I guess the very first code author is trying to do something to make consuming faster.

A fundamental assumption about a system which uses queues is that messages are consumed at approximately the same rate that they are produced. Queues exist to buffer uneven producing activities. The mathematics and statistics of how queues work are quite interesting, and assuming the production of messages is done in response to some real-world stimulus, your system is virtually guaranteed to behave in a predictable manner. Therefore, your design goal is to ensure that there are enough consumers to process the messages that are produced, and to respond to changing conditions as needed. Your goal should not be to "speed up" the consumers (unless they have some specific issue), but rather to have enough consumers to process the total load.

Further, the average number of items in the queue at any time should approach zero. It is usually a good idea to have overcapacity so that you don't wind up with an unstable situation where messages start accumulating in the queue (and the queue ends up looking like the Stack Overflow Close Vote Queue).

And that brings us to an attempt to answer your fundamental question, which seems to deal with threading and possibly detailed implementation of the Java client, which I will readily admit I have not used (I'm a .NET guy).

Here are some design guidelines for your software:

  1. Ensure that a single thread uses no more than one channel.
  2. Use one TCP connection per logical consuming process.
  3. Balance the number of logical processes on a single physical machine such that resource contention is not a problem (you don't want to starve your consumers of computer resources).
  4. Try to use BASIC.GET as opposed to a push-based consumer. Use of consumers is difficult in practice, and there is no performance benefit at the protocol level over a BASIC.GET. Note I do not know if the Java library has implemented these differently such that it does cause a performance difference- stranger things have been known to happen.
  5. If you do use consumers, make sure pre-fetch is set to 0 (disabled) and that AutoAck is set to false if reliable processing is important (most applications require reliable processing). Along with this, make sure you are acknowledging messages upon completion of processing!
  6. Periodically reboot your consuming threads, channels, and processors - or do a BASIC.Recover. There are degrees of randomness that will result in unacknowledged messages accumulating over time, and this will deal with it.
  7. Again, if you prefer to use consumers, generally speaking to share consumers across channels is a bad idea. Each consumer should get its own channel.

Upvotes: 4

Related Questions