Reputation: 900
I run a service that takes user request and adds it to a table named 'queue' that has 3 fields: userid
, queuenumber
, and processed
.
I store there tasks that need to be processed ( I can process less tasks than I get requests )
Unfortunately, I don't really know much about mysql optimization and my processing script ( that I run 50 versions in parallel ) uses queries like those:
SELECT * FROM `queue` WHERE `processed` = 0 ORDER BY `queue`.`queuenumber`
When any instance of the code takes a row to process, it marks it processed
= 1 so it won't be taken up by the other instances.
So when I run 50 instances of this code, it becomes heavy on the MySQL database ( and I have only 4GB of ram )
Because of that, I get stats like those:
Traffic:
Received 15.8 MiB
Sent 42.9 GiB
ø per hour
Received: 3 MiB
Sent: 8.1 GiB
Some stats from phpmyadmin:
How do I optimize this? To be able to run this code in parallel but not list the whole tables?
// EDIT:
The results of create table are:
CREATE TABLE queue
(
userid
int(11) NOT NULL,
queuenumber
int(11) NOT NULL AUTO_INCREMENT,
processed
tinyint(1) NOT NULL,
PRIMARY KEY (queuenumber
),
UNIQUE KEY queuenumber
(queuenumber
),
UNIQUE KEY userid
(userid
)
) ENGINE=InnoDB AUTO_INCREMENT=121617 DEFAULT CHARSET=latin1
Upvotes: 1
Views: 185
Reputation: 2282
Implementing queues with an RDBMS is an anti-pattern: http://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you
However if you insist there are ways to do it consistently without race conditions.
First implement a notification (outside of MySQL) when a new row is enqueued so you don't have to poll for new data in a fast loop - maybe this is why your traffic is so high. I have seen UDP multicast used for this, or UNIX signals.
Now check table is using InnoDB
SHOW CREATE TABLE `queue`;
If you need to change it to InnoDB
ALTER TABLE `queue` ENGINE=InnoDB;
If you have multiple consumers processing jobs ensure your locking is race free; every time a consumer encounter a new job the first thing it should try to do is to update it with it's unique process-id. each consumer should perform an update on the table to assign the row/job to a consumer by ID
You will need to have a process column on your table
ALTER TABLE `queue` ADD `process` int unsigned default NULL;
Add an index to help you select the rows efficiently
ALTER TABLE `queue` ADD KEY (`processed`,`process`);
Now your consumers can run the following in their code
UPDATE `queue` SET process = ? WHERE process IS NULL AND processed = 0 LIMIT 1;
You can set ? to be the pid or thread-id of the consumer. You can tell if the update found a row by either checking the affected rows in the response of the query (fastest) or try to get the row details anyway - maybe you will return 0 rows which means the update did nothing - a different consumer got the job.
SELECT * FROM `queue` WHERE process = ? AND processed = 0;
When your consumer is finished with the row/job it can just set processed to 0 on that row (maybe you should consider an auto-inc primary key on the table you you can accurately address a single row)
ALTER TABLE `queue` ADD `id` bigint unsigned NOT NULL auto_increment FIRST, ADD primary key (`id`);
If your consumers crash for whatever reason you can tell what was going on by looking at the row - if processed = 0 but process has been set then something crashed while processing that row.
Upvotes: 2