Lostsoul
Lostsoul

Reputation: 25991

Is it possible to ensure unique messages are in a rabbitmq queue?

Basically my consumers are producers as well. We get an initial dataset and it gets sent to the queue. A consumer takes an item and processes it, from that point there's 3 possibilities:

  1. Data is good and gets putting a 'good' queue for storage
  2. Data is bad and discarded
  3. Data is not good(yet) or bad(yet) so data is broken down into smaller parts and sent back to the queue for further processing.

My problem is with step 3, because the queue grows very quickly at first its possible that a piece of data is broken down into a part thats duplicated in the queue and the consumers continue to process it and end up in a infinite loop.

I think the way to prevent against this is to prevent duplicates from going into the queue. I can't do this on the client side because over the course of an hour I may have many cores dealing with billions of data points(to have each client scan it before submitting would slow me down too much). I think this needs to be done on the server side but, like I mentioned, the data is quite large and I don't know how to efficiently ensure no duplicates.

I might be asking the impossible but thought I'd give it a shot. Any ideas would be greatly appreciated.

Upvotes: 13

Views: 24646

Answers (3)

Enderson Maia
Enderson Maia

Reputation: 263

There's a plugin for rabbitmq that enables you to do this type of control with some additional headers.

You should enable the plugin and define x-deduplication-header on the message, with a hash or something that uniquely identifies the message sent, so when other message with the same header value gets into rabbitmq`s exchange it will not be routed to any queue.

See : https://github.com/noxdafox/rabbitmq-message-deduplication

Upvotes: 2

Roman Gaufman
Roman Gaufman

Reputation: 1124

I think even if you could fix the issue of not sending duplicates to the queue, you will sooner or later hit this issue:

From RabbitMQ Documentation: "Recovery from failure: in the event that a client is disconnected from the broker owing to failure of the node to which the client was connected, if the client was a publishing client, it's possible for the broker to have accepted and passed on messages from the client without the client having received confirmation for them; and likewise on the consuming side it's possible for the client to have issued acknowledgements for messages and have no idea whether or not those acknowledgements made it to the broker and were processed before the failure occurred. In short, you still need to make sure your consuming clients can identify and deal with duplicate messages."

Basically, it looks like this, you send a request to rabbitmq, rabbitmq replies with an ACK but for 1 reason or another, your consumer or producer does not receive this ACK. Rabbitmq has no way of knowing the ack was not received and your producer will end up re-sending the message, having never received an ack.

It is a pain to handle duplicate messages especially in apps where messaging is used as a kind of RPC, but it looks like this is unavoidable when using this kind of messaging architecture.

Upvotes: 11

Brian Kelly
Brian Kelly

Reputation: 19295

The core problem seems to be this:

"...its possible that a piece of data is broken down into a part that's 
duplicated in the queue and the consumers continue to process it and 
end up in a infinite loop."

You can focus on uniqueness of your queued items all you want, but the issue above is where you should focus your efforts, IMO. One way to prevent infinite looping might be to have a "visited" bit in your message payload that is set by consumers before they re-queue the broken-down item.

Another option would be to have the consumers re-queue back to a special queue that is treated slightly differently to prevent infinite looping. Either way, you should attack the issue by dealing with it as a core part of your application's strategy rather than using a feature of a messaging system to step around it.

Upvotes: 3

Related Questions