Srikanth
Srikanth

Reputation: 1025

RabbitMQ consumer utilization zero under high load

We use spring amqp based listener container - with about 40 consumers, prefetch count 1. The message ttl is about 60sec before it gets to dead letter queue.

The operation performed by each consumer is a database update - which is slower than the rate at which the messages arrive into the queue.

After sometime, the number of messages pile up in queue the consumer utilization is down to zero. I was under the impression that the consumers are blocked on database. However if I look at the thread dump all the consumers are in wait state on rabbit mq - no messages are being processed.

    "SimpleAsyncTaskExecutor-7" #51 prio=5 os_prio=0 tid=0x00007fcb01ad0800 nid=0x58f7 waiting on condition [0x00007fcae5af1000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000000854c30c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
    at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
    at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.nextMessage(BlockingQueueConsumer.java:390)
    at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:1097)
    at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:1086)
    at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$1100(SimpleMessageListenerContainer.java:93)
    at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1203)
    at java.lang.Thread.run(Thread.java:745)

The queue also gets into flow state.

Not sure why the queue message processing is stopped. I understand publish to it being restricted.

Any suggestions would help.

Upvotes: 1

Views: 4284

Answers (1)

cantSleepNow
cantSleepNow

Reputation: 10192

I'm never sure how to answer on "suggestions" on SO, so I'll suggest :)

Here are couple of suggestions:

  • increase the number of consumers
  • increase prefetch limit

Now, I can't tell you to what values, exactly, this has to fine tuned. Also you can try with one of these thing or both. Maybe this article can give you a rough idea to on how to start (i.e. what values).

Additionally you could also scale it up, so mirror the queues to couple of more nodes in the cluster, and consume the messages from there.

Also check this article. Credit flow looks like something you could also try, as well as message paging.

Upvotes: 1

Related Questions