Reputation: 5409
For my university project, I'm developing a dynamic live chat website with rooms, user registration, etc. I've got the entire system planned out bar one aspect. The rooms.
A room is created by a user who is then an operator of that room. Users can join the room and talk within it. The system has to be scalable, accounting for hundreds of thousands if not millions of messages being sent a day.
I was going to create on table in my database called messages
, and have fields like this:
| r_id | u_id | message | timestamp |
r_id
and u_id
would be foreign keys to the room ID and user ID respectively. This would mean I would insert a new record when a user sends a message, and periodically run a SELECT statement for every client (say every 3 seconds or so) to get the recent messages. But because the table will be huge, running these statements might create a lot of overhead and take a long time.
Another way of implementing this would be to create a new database table for every room. Say a user creates 3 rooms called General
, Programming
and Gaming
. The database tables would be room_general
, room_programming
, and room_gaming
, each with fields like:
| u_id | message | timestamp |
This would drastically cut down on the amount of queries for each table, but might introduce problems when I come to program it.
So, what is the best way to do this?
I'm using MySQL with PHP and AJAX.
Upvotes: 1
Views: 5244
Reputation: 520
Whenever the user updates, you save the message to a cache specific to a room with a timestamp of when the message came in, while saving it to the database at the time. When the clients requests for new messages, if the user is not new in the chat room, you check the last time the user got served by the server and load the new messages from the cache for the request. But if the user is new, then you serve them from the database.
To improve scalability in this scenario, you have to set the expiration of the messages so that messages can expire after that time. Or implement an async method that deletes old messages based on their timestamp.
Upvotes: 0
Reputation: 1246
It is bad idea to create a table per room. Hard to implement and hard to support.
Dont worry about performance of selects because they will be wery simple:
SELECT * FROM messages WHERE r_id=X ORDER BY timestamp DESC LIMIT X,Y
Just make sure your (r_id, timestamp) indexed together in this order to make this select using index:
ALTER TABLE `messages` ADD KEY `IN_messages_room_time` (`r_id`, `timestamp`);
If you will still have problems with performance (probably you will not), just add a 1-3 seconds inmemory cache (using memcache) and fetch a messages from DB one time per 1-3 seconds.
Also look at the Apollo Clark's answer: https://stackoverflow.com/a/8673165/436932 to prevent storing huge amount of unneccessary old messages: you can just put it in to the MYISAM table archive
or simply delete.
Upvotes: 1
Reputation: 149
To speed up your database, have a look at indexing your tables: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
In your case I assume that you'd be SELECTing messages by r_id while doing a JOIN on the user table through u_id. I would index the r_id and u_id columns. I am by no means an expert on this subject as I've only done "what works" for my own projects. I don't understand every pro and con of indexing, just that indexing those columns that are typically used as, well, indexes, speeds things up. Google "mysql index tutorial", you'll find plenty more information.
Don't go nuts and index every column, you'll slow down your INSERTs and UPDATEs.
I also suggest that you purge the chat logs every few days / weeks, or move them to another server for archival purposes if that's what you want / need to do.
Upvotes: 1
Reputation: 806
Look into creating a "transaction table" for storing the messages. Basically, you need to decide, do I really want to log all of the messages ever posted to the room, or just the message posted this past month / week / day / hour. If you really want to have a history of every message ever written, then you would create two databases. If you don't want to keep a history of every message, then you just need one table.
Using a transaction table, here's how it would flow:
every 500msec or 3sec, every user in the room would query the transaction table to get the latest updates from the past 500msec or 3sec
SELECT * FROM message_transactions WHERE timestamp > 123456789
a CRON job runs every 5 min or 1 hour, and deletes all entries older then 5min or however long you want the history to be.
Be sure to synchronize and round the time that each user queries the transaction table, so that the MySQL query result caching will kick in. For example, round the timestamp to once every 1sec or every 500msec.
What'll happen now is the users only get the newest messages, and your database won't explode in size over time, or slow down. Doing this, you'll need to cache the history of messages on the client-side in JS.
On the flip side, you could just get a PHP to IRC library, and call it a day. Also, if you're curious about it, look into how Facebook implements their AJAX-based chat system.
Upvotes: 1
Reputation: 24579
You could potentially use memcached to hold recent chat messages in memory and do your database writes in bulk.
Using memcached as a database buffer for chat messages
Upvotes: 0