Michael
Michael

Reputation: 521

Message queue like RabbitMQ for high volume writes to SQL database?

The scenario is needing to write high volume data, like tracking clicks or mouse movements, from a web application to a SQL database. The data doesn't need to be written right away because the analysis on the data happens on some recurring basis, like daily or weekly.

I want some feedback on a solution that comes to mind:

The click and mouse data is published to a message queue. This stores the queue items in memory so it should be fast and faster than SQL. Then on some other server a job plugs away on retrieving the next queue item and writing the data to SQL.

Does anyone know of implementations like this? What pitfalls am I failing to see? If this solution is not a good one are there other alternatives?

Regards

Upvotes: 6

Views: 9713

Answers (2)

antken
antken

Reputation: 989

RabbitMQ is meant for real time message exchange and not for temporary buffering data. If you are able to consume all data as soon as it arrives in your queues, then this solution will work for you. Otherwise RabbitMQ will grow in memory and eventually die. Then you will have to configure it to throw some data away (there are a lot of options to choose rules for this).

You could possibly store data in Redis cache, you can do it as fast as you publish your events to RabbitMQ. Then you can listen to the new changes in Redis from remote server and fill up whatever database storage you use, or even use it as your data storage.

Upvotes: 1

chris.ellis
chris.ellis

Reputation: 893

To solve a very similar problem I was considering doing exactly this. In the end we decided not to go for it because we did need access to the data very quickly. However I still like the idea.

Ive also recently learnt that under the hood this is exactly the way that Microsofft Dynamics CRM does its database updates, using message passing.

Things I think you would need to pay careful attention to.

  1. Make sure that if your RabbitMQ instance disappeared it wouldnt have any affect on your client. Rabbit dying is bad enough, your client erroring because Rabbit is down would be terrible.
  2. If it's truly very high volume (and its good practice for reliability anyway) clustering is something worth looking at.
  3. Obviously paying attention to your deadletter queues is a must. But the ability to play back messages which failed for some reason is awesome, in theory at least your data should eventually always get to you database. Even if it went down for a period of time.
  4. Make sure you can keep up with the number of messages being passed in. Of course, this should be solvable by adding more consumer to a given queue. Which leads to...
  5. Idempotency of messages. Given that your messages relate directly to a DB write, they HAVE to be idempotent.

Upvotes: 0

Related Questions