user1105983
user1105983

Reputation: 41

Strategy to Save messages from live chat into MySQL or Dynmodb

I am writing a Live chat application, which will be used by many users. I am thinking about using ElasticCache Redis from Amazon to manage our PUB/SUB and Latest messages cache.

The only problem I see is about saving these Live messages to a database for future use. Any suggestions about what strategy can I use to save these messages from Elastic Cache into a Database.

Is RDS preferred or should I use a NoSQL e.g. Dynmodb to store these messages? Should I create a queue to store these messages from Cache or saving them real time can also work.

Thanks

Upvotes: 1

Views: 2066

Answers (2)

kta
kta

Reputation: 20140

Is RDS preferred or should I use a NoSQL e.g. Dynmodb to store these messages?

  • Start with relational database.

Should I create a queue to store these messages from Cache or saving them real time can also work.

  • Besides using cache, for incoming messages, you should implement pub/sub. You can use redis for both of these purposes.

Upvotes: 0

Ryan Merl
Ryan Merl

Reputation: 186

The appropriate strategy here is dependent largely on volume, expected query pattern, and message retention. Let's assume that you'll want to support permanent retention and move from there:

Large RDS instances can easily handles thousands of writes per seconds and read slaves will help balance read load efficiently. In particular Aurora is very good for that and I'd suggest you look into it and compare it against traditional RDS. Also, Postgres for the underlying mechanism will have higher write throughput than a MySQL backed instance due to the different locking strategy that's more favorable for overall throughput. If your live messages are relayed via a pubsub system, an extra "recent messages" cache in redis may not actually be necessary and could be handled by a read slave or simply by the master if the volume is low enough. This also will depend on the type of chat system. 1-on-1 chat vs room-based or global chat will have significantly different read characteristics.

The biggest problem with the SQL solution will be messages over time and being able to efficiently surface any message from all of time if your message counts run towards the billion+ scale. Based on the different chat types, this may be shardable, but something like a NoSQL solution might make more sense there. They are, of course, not without their caveats. They'll scale more horizontally and be able to handle higher growth in number of writes or messages per second at the top end, and have more natural sharding ability based on the data model, but the data models will be more restrictive and harder to query in certain manners.

That being said, for simplicity, if you're not planning on passing the billion message mark or 1000s of messages/second, starting with SQL will probably offer some simplicity and flexibility. Starting with a NoSQL database with less expertise is more likely to run you into trouble sooner than you would otherwise by hitting unexpected caveats or development issues.

In terms of the write pattern you'd actually use, I think writing to the database first, the cache second, and publishing to a pubsub topic after a successful write helps ensure historical consistency. That also depends on the guarantees you'd like to make though. If live delivery is more important than historical accuracy, than the opposite order probably applies. If you select a SQL database however, this would mean that your throughput is tied directly to the write throughput of a single SQL master. Postgres has recently introduced the possibility of bi-directional replication which gives you multi-master support, but it has a lot of caveats and I don't believe is supported by RDS anyway.

For pub-sub redis would likely be sufficient, but this again depends on scale. On the higher end, something more distributed and fault tolerant may be more appropriate. For instance, AWS has a dedicated pubsub service with SNS. This would have the benefit of relieving management, and will likely have a lot more room for growth in terms of message throughput. Redis is great and incredibly fast, but it's also going to be a single point of failure, is memory constrained, and also, at the end of the day bound to a single thread. But if you're starting at the low end of the scale and don't plan on hitting very high throughput Redis would be perfectly sufficient.

IMPORTANT: One thing about using redis for pubsub, however, is that redis should not be exposed to outside connections. This is a potentially massive security issue, so if you have clients outside of your network connecting directly (like I assume you would want with a chat system), Redis would be a bad choice. It should always be blocked from outside connects. Always.


TL;DR: - For lower ends of scale, RDS will likely meet your needs with a traditional master slave setup for quite a while, but a NoSQL solution like Dynamo or Cassandra would better meet long term growth, incredibly high throughput, or significant data volume. - Redis is not likely a great choice for PUBSUB due to security concerns, and may or may not be necessary for a caching layer but other pubsub technologies would likely be sufficient for live message delivery.

Upvotes: 5

Related Questions