user280610
user280610

Reputation:

Database Queues and Queue Processing

I am currently in the process of putting together a reference architecture for a distributed event-based system where events are stored in a SQL Server Azure database using plain old tables (no SQL Server Service Broker).

Events will be processed using Worker Roles that will poll the queue for new event messages.

In my research, I see a number of solutions that allow for multiple processors to process messages off of the queue. The problem I have with a lot of the patterns I'm seeing is the added complexity of managing locking, etc when multiple processes are trying to access the single message queue.

I understand that the traditional queue pattern is to have multiple processors pulling from a single queue. However, assuming that event messages can be processed in any order, is there any reason not to just create a one-to-one relationship between a queue and its queue processor and just load-balance between the different queues?

queue_1 => processor_1
queue_2 => processor_2

This implementation avoids all of the plumbing necessary to manage concurrent access to the queue across multiple processors. The event publisher can use any load-balancing algorithm to decide which queue to publish messages to.

The fact that I don't see this sort of implementation in any of my searches makes me think I'm overlooking a major deficit in this design.

Edit

This post has triggered a debate over using database tables as queues vs. MSMQ, Azure Queues, etc. I understand that there are a number of native queuing options available to me, including Durable Message Buffers in Azure AppFabric. I've evaluated my options and determined that SQL Azure tables will be sufficient. The intention of my question was to discuss the use of multiple processors against a single queue vs. one processor per queue.

Upvotes: 5

Views: 5673

Answers (4)

Remus Rusanu
Remus Rusanu

Reputation: 294467

See Using tables as Queues for a more detailed discussion of this topic. The issue is not only how you access the 'queue', but also how you index it, the clustered index must allows direct seek of the next row to dequeue, otherwise you'll deadlock constantly.

You want your processors to race to the same queue, load balancing by spreading out to different queues is an anti-pattern. It leads to convoys and artificial latency where you have items queued up behind a late processor, but other processors are free and idle because their queue is empty.

Upvotes: 5

regilero
regilero

Reputation: 30556

The point you're missing, to my mind, is that when using queues one of the important point is that orders are saved and whatever happens once it's in the queue it won't be lost.

Now pollers process can die, they wan have a lot of different problems, you don't care, the queue is the place where the orders are safe.

Pollers does'nt require the same level of robustness. Postfix for example is a very secure implementation of mail transporter where message queues are used in a lot of levels (each subsystem in the application which requires a different security level communicate with others with queues) - and you can switch off the power you will not loose any mail, workers can die very badly, mails can't.

Edit

That means the basic usage is storing an order, and ignoring what the workers will do with that, how many workers are still alive, etc. So the only reason to handle several queues is to manage several destinations for your order (application logic) and not to manage the way the workers should work with them (Decoupling).

Upvotes: 0

David Makogon
David Makogon

Reputation: 71119

As S.Lott mentioned, there are message queue mechanisms you can use. MSMQ won't really help in Windows Azure, but Windows Azure already has a durable queue mechanism. You can easily set up each worker role instance to read one (or more) queue items. Once a queue item is read, it's "invisible" for whatever length of time you specify (or 30 seconds if no time specified). Queue messages can be up to 8K, and they're considered "durable" - all Azure storage is replicated a minimum of 3 times (as is SQL Azure).

While you can implement something like what gbn describes, I really think you should consider the native Azure Queue service when working in Windows Azure. You'll easily be able to scale to multiple queue consumers and won't have to worry about concurrency or special load-balancing code - just increase (or decrease) instance count.

For more info about Windows Azure queues, check out the Azure Platform Training Kit - there are several simple labs that walk you through queue basics.

Upvotes: 1

gbn
gbn

Reputation: 432672

Tables as queues are quite easy to do. See my SO answer here please: SQL Server Process Queue Race Condition

Upvotes: 1

Related Questions