raj
raj

Reputation: 187

Real-time duplication of data among EC2 instances located in different regions

I'm new to AWS and back-end architecture in general. My current configuration is an EC2 instance (south-east region Singapore) running a Twisted real-time server for a real-time chat app.

Currently, in my implementation, whenever a sender sends a message to the server, it is stored in a python dictionary on the server if the receiver is not online. So basically it is storing this message in the instance's RAM. Now, I want to make the app available worldwide, so I'll be running it on instances of different regions. So my question is, how am I supposed to duplicate/replicate this dictionary stored in RAM of one instance to all the other instance, so it is readily available in all regions? (The reason of storing the messages in RAM and not in a database is the nature of the app. The app involves a large volume of messages sent in bursts, which requires it to be considerably faster than speeds offered by a persistent DB store's I/O read-writes.) My aim is to make the app available globally, and having real-time performance.

(Kindly don't flag this question as an "opinion-based" question and close it. I'm new to server side architecture and I really need someone to at least just point me in the right direction. And I don't think I'll be able to find help on this anywhere other than StackOverflow.)

Upvotes: 0

Views: 229

Answers (1)

Sjuul Janssen
Sjuul Janssen

Reputation: 1812

Here's a few things I would think of if I had to build it myself (I've implemented most of these pointers in our own project and it took me quite a while).

  • If you really really need all servers to be in sync you'll need a consensus protocol. If you do. Don't built this yourself. It's going to take a lot of time and errors.
  • If you can, partition your chat data into chatrooms and have only a few servers handle one chatroom.
  • I've used msgpack to encode my data. It's faster and smaller than json.
  • You'll benefit a lot of compressing your data before you send it over the wire. Have a look at something like zlib or lz4
  • Even though the size of compressed msgpack is almost the same of that compressed json. I'd choose msgpack because it's faster. It's easier to parse because it's length prefixed encoded.
  • I would try to send messages together. Batch up all messages every x ms. In my project I chose 100ms batching up messages will save you a lot of bandwidth since your compression algorithm can remove more duplication.
  • You'll have to handle connection timeouts. Only regard a message as sent and done when you get a reply back (you'll have to design/choose your protocol to handle that)
  • Think of what is acceptable, how much data you're willing to loose when something crashes or otherwise fails. If you're not willing to loose data you'll have to implement something that stores data to disk.
  • I've had the problem that writes to database we use (Google Cloud Datastore) take a long time as well. Like somewhere between 100ms and 900ms depending on how much I store. What I did was only store this data every x seconds and set flags on objects that need to be saved next run. Of course you can only do this if you're willing to loose some data when your program crashes.
  • You'll need something to keep track of what servers are running and which server is responsible for which piece of data
  • Set up something that checks whether your connection is alive. For example send echoRequests and echos every x time. The sooner you detect a faillure the better. Note however if your reactor is blocked by some cpu intensive task it will not send your echo in time.
  • If you're not in control of how much data comes in you'll have to slow down or penalize connections that would otherwise take up all of your server time.

EDIT: I only now see that you're looking into redis. As far as I know it's a good queueing system. Use that if you can. Implementing the stuff above would take a lot of time to get it right.

Upvotes: 1

Related Questions