simo
simo

Reputation: 24558

Persistent messages over channels

I am building an app using phoenix framework that will use thousands of channels, users will be sending messages of long, lat info each one seconds.

So, I will be getting thousands of messages each second.

The question is, how would I store each message I get into DB? Please bear with me to describe my thoughts about this, as I am thinking of the following options to handle message storing into DB:

  1. Use the same connection process to store the message into DB.

    Possible caveat: Would this effect the latency time and performance of the channel?

  2. Create a DB dedicated process for each channel to store its messages.

    Possible caveat: Then, if I have 100'000 channel, I will need another 100'000 process to store messages, is this fine? considering that erlang processes are lite and cheap?

  3. Create one DB process for all channel, then each message from any channel will be queued, then stored by this individual DB process.

    Possible caveat: One process to store all messages of thousands of channels, the message queue will go high, it will be slow?

So, which is the recommended approach to store messages coming each second from thousands of channels?

EDIT

I will be using dynamo db which will scale to handle thousands of concurrent request with ease.

Upvotes: 2

Views: 391

Answers (1)

Greg
Greg

Reputation: 8340

The most important question is if the request in the connection channel can be completed before it's written to the DB or not. You need to consider what would happen if the connection process responded back to the client and something happened to the DB so that it has been written. If the data loss is acceptable then the DB access can be completed asynchronously, if not, then it needs to be synchronous, e.g. respond to the client only after the DB confirmed that it has stored the request.

If the data can be stored to the database asynchronously then it's better to either spawn a new process to complete it or add it to a queue (2 and 3). If the data has to be stored synchronously then it's easier to handle it in the same process (1). Please note that the request has to be copied between processes which may affect performance if the DB write is handled in a different process and the message size is big.

It's possible to improve the reliability of the asynchronous write for example by storing the request somewhere persistently before it's written to the DB, then reply back to the client, and then try to complete the DB write, which can then be retried if the DB is down. But it complicates this a bit.

You also need to determine the bottleneck, what would the slowest part of the architecture. If the DB then it doesn't matter if you create one queue of requests to the DB or if you create a new process for each connection, the requests will pile up either in the memory of that single process or in the amount of created processes.

I would probably determine how many parallel connections the DB can handle without sacrificing on latency too much and create a pool of processes to handle the requests. Then I would create a queue to dispatch requests to those pooled processes. To handle bigger messages I would obtain a token (permission to write) from the queue and connect to the DB directly to avoid copying the message too much. That architecture would be easier to extend if any bottlenecks have been found later, e.g. persistently store incoming messages before they can be written to the DB or balance requests to additional nodes when the DB is overloaded.

Upvotes: 2

Related Questions