Reputation: 65
I currently am looking for a solution to a basic problem I have: the deletion of old records.
To explain the situation, I have a table, which I'll call table1, with a reduced number of records. Usually it stays empty, as it is used to relay messages. These messages are read within two seconds of being added to the database, and deleted so that they aren't read again.
However, if one of the clients supposed to receive the messages from table1 goes offline, several messages can become pending. Sometimes hundreds. Sometimes thousands, or even hundreds of thousands, if not more.
Not only does this hurt the client's performance, which will have to process a huge amount of messages, it also hurts the database's which is kept in memory and should keep a minimal amount of records.
Considering the clients check for new messages every second, what would be the best way to delete old records? I've thought about adding timestamps, but won't that hurt the performance: the fact that it has to calculate timestamps when inserting? I've tried it out, and all those queries ended up in the slow queries log.
What would the best solution be? I've thought about something like checking if the table was altered in the past 5 seconds, and if not, we can be safe that all messages that should be relayed have been relayed already, and it can be wiped. But how can this be done?
I've thought about events running every couple of minutes, but I'm not sure how to implement something that would have no (or meaningless) impact on the select/insert/delete queries.
PS: This situation arrives when I noticed that some clients were offline, and there were 8 million messages pending.
EDIT :
I had forgotten to mention that the storage engine is MEMORY, and therefore all records are kept in RAM. That's the main reason I want to get rid of these records: because millions of records which shouldn't even be there, being kept in RAM, has an impact on system resources.
Here is an extract from the error log:
# Query_time: 0.000283 Lock_time: 0.000070 Rows_sent: 0 Rows_examined: 96
SET timestamp=1387199997;
DELETE FROM messages WHERE clientid='100';
[...]
# Query_time: 0.000178 Lock_time: 0.000054 Rows_sent: 0 Rows_examined: 96
SET timestamp=1387199998;
DELETE FROM messages WHERE clientid='14';
So I guess they do have a quite small delay, but is it in any way meaningful in MySQL? I mean, in "real life", 0.0003 could be completely ignored due to its insignificance, can the same be said about MySQL and connections with approximately 10ms ping?
Upvotes: 3
Views: 1448
Reputation: 29619
Your question is interesting, but hasn't a lot of detail, so I can only give general points of view.
Firstly - there exist already a number of message queuing solutions which may do what you need out of the box. They hide the underlying implementation of data storage, clean-up etc. and allow you to focus on the application logic. RabbitMQ is a popular open source option.
Secondly, unless you are working with constrained hardware, 100s of thousands of records in a MySQL table is not a performance problem in most cases, nor is generating a time stamp on insert. So, I would recommend building a solution that's obvious and straightforward (and therefore less error prone) - add a timestamp column to your message table, and find a way of removing messages older than 5 minutes. You could add this to the logic which cleans up the records after delivery. As long as your queries are hitting indexed columns, I don't think you have to worry about hundreds of thousands of records.
I would put some energy into creating a performance test suite that allows you to experiment with solutions and see which is really faster. That can be tricky, especially if you want to test scenarios with multiple clients, but you will learn a lot more about the performance characteristics of the app by working through those scenarios.
EDIT:
You can have one column in your table automatically set a timestamp value - I've always found this to be extremely fast. As in - it's never been a problem on very large tables (tens of millions of rows).
I've not got much experience with the memory storage engine - but the MySQL documentation suggests that data modification actions (like insert or update or delete) can be slow due to locking time - that's borne out by your statistics, where the locking time is roughly 30% of the total.
Upvotes: 3
Reputation: 108641
I've had a similar problem.
A couple of questions: First, how long should undelivered messages dwell in the system? Forever? A day? Ten seconds?
Second, what is the consequence of erroneously deleting an undelivered message? Does it cause the collapse of the global banking system? Does it cause a hospital patient not to receive a needed injection? Or does a subsequent message simply cover for the missing one?
The best situation is short dwell time and low error consequence. If the error consequence is high, none of this is wise.
Setting up the solution took several steps for me.
First, write some code to fetch the max id from the messages table.
SELECT MAX(message_id) AS max_message_id FROM message
Then, an hour later, or ten seconds, or a day, or whatever, delete all the messages with id numbers less than the recorded one from the previous run.
DELETE FROM message WHERE message_id <= ?max_message_id
If all is functioning correctly, there won't be anything to delete. But if you have a bunch of stale messages for a client that's gone walkabout, pow, they're gone.
Finally, before putting this into production, wait for a quiet moment in your system, and, just once, issue the command
TRUNCATE TABLE message
to clear out any old rubbish in the table.
You can do this with an event (a stored job in the MySQL database) by creating a little one-row, one-column table to store the max_message_id.
EDIT
You can also alter your table to add a message_time column, in such a way that it gets set automatically whenever you insert a row. Issue these three statements at a time when your system is quiet and you can afford to trash all extant messages.
TRUNCATE TABLE message;
ALTER TABLE message ADD COLUMN message_time TIMESTAMP
NOT NULL
DEFAULT CURRENT_TIMESTAMP;
ALTER TABLE message ADD INDEX message_time (message_time);
Then you can just use a single statement to clean out the old records, like so.
DELETE FROM message WHERE message_time <= NOW() - INTERVAL 1 HOUR
(or whatever interval is appropriate). You should definitely alter an empty or almost-empty table because it takes time to alter lots of rows.
This is a good solution because there's a chance that you don't have to alter your message-processing client code at all. (Of course, if you did SELECT *
anywhere, you probably will have to alter it. Pro-tip: never use SELECT *
in application code.)
Upvotes: 2