Reputation: 59333
I have a MySQL table that is filled with mails from a postfix mail log. The table is updated very often, some times multiple times per second. Here's the SHOW CREATE TABLE
output:
Create Table postfix_mails CREATE TABLE `postfix_mails` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`mail_id` varchar(20) COLLATE utf8_danish_ci NOT NULL,
`host` varchar(30) COLLATE utf8_danish_ci NOT NULL,
`queued_at` datetime NOT NULL COMMENT 'When the message was received by the MTA',
`attempt_at` datetime NOT NULL COMMENT 'When the MTA last attempted to relay the message',
`attempts` smallint(5) unsigned NOT NULL,
`from` varchar(254) COLLATE utf8_danish_ci DEFAULT NULL,
`to` varchar(254) COLLATE utf8_danish_ci NOT NULL,
`source_relay` varchar(100) COLLATE utf8_danish_ci DEFAULT NULL,
`target_relay` varchar(100) COLLATE utf8_danish_ci DEFAULT NULL,
`target_relay_status` enum('sent','deferred','bounced','expired') COLLATE utf8_danish_ci NOT NULL,
`target_relay_comment` varchar(4098) COLLATE utf8_danish_ci NOT NULL,
`dsn` varchar(10) COLLATE utf8_danish_ci NOT NULL,
`size` int(11) unsigned NOT NULL,
`delay` float unsigned NOT NULL,
`delays` varchar(50) COLLATE utf8_danish_ci NOT NULL,
`nrcpt` smallint(5) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `mail_signature` (`host`,`mail_id`,`to`),
KEY `from` (`from`),
KEY `to` (`to`),
KEY `source_relay` (`source_relay`),
KEY `target_relay` (`target_relay`),
KEY `target_relay_status` (`target_relay_status`),
KEY `mail_id` (`mail_id`),
KEY `last_attempt_at` (`attempt_at`),
KEY `queued_at` (`queued_at`)
) ENGINE=InnoDB AUTO_INCREMENT=111592 DEFAULT CHARSET=utf8 COLLATE=utf8_danish_ci
I want to know how many mails were relayed through a specific host on a specific date, so I'm using this query:
SELECT COUNT(*) as `count`
FROM `postfix_mails`
WHERE `queued_at` LIKE '2016-04-11%'
AND `host` = 'mta03'
The query takes between 100 and 110 ms.
Currently the table contains about 70 000 mails, and the query returns around 31 000. This is only a couple of days' worth of mails, and I plan to keep at least a month. The query cache doesn't help much because the table is getting updated constantly.
I have tried doing this instead:
SELECT SQL_NO_CACHE COUNT(*) as `count`
FROM `postfix_mails`
WHERE `queued_at` >= '2016-04-11'
AND `queued_at` < '2016-04-12'
AND `host` = 'mta03'
But the query takes the exact same time to run. I have made these changes to the MySQL configuration:
[mysqld]
query_cache_size = 128M
key_buffer_size = 256M
read_buffer_size = 128M
sort_buffer_size = 128M
innodb_buffer_pool_size = 4096M
And confirmed that they are all in effect (SHOW VARIABLES
) but the query doesn't run any faster.
Am I doing something stupid that makes this query take this long? Can you spot any obvious or non-obvious ways to make it faster? Is there another database engine that works better than InnoDB in this scenario?
mysql> EXPLAIN SELECT SQL_NO_CACHE COUNT(*) as `count`
-> FROM `postfix_mails`
-> WHERE `queued_at` >= '2016-04-11'
-> AND `queued_at` < '2016-04-12'
-> AND `host` = 'mta03';
+----+-------------+---------------+------+--------------------------+----------------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+--------------------------+----------------+---------+-------+-------+-------------+
| 1 | SIMPLE | postfix_mails | ref | mail_signature,queued_at | mail_signature | 92 | const | 53244 | Using where |
+----+-------------+---------------+------+--------------------------+----------------+---------+-------+-------+-------------+
1 row in set (0.00 sec)
Upvotes: 1
Views: 89
Reputation: 1269713
queued_at
is a datetime value. Don't use LIKE
. That converts it to a string, preventing the use of indexes and imposing a full-table scan. Instead, you want an appropriate index and to fix the query.
The query is:
SELECT COUNT(*) as `count`
FROM `postfix_mails`
WHERE `queued_at` >= '2016-04-11' AND `queued_at` < DATE_ADD('2016-04-11', interval 1 day) AND
`host` = 'mta03';
Then you want a composite index on postfix_mails(host, queued_at)
. The host
column needs to be first.
Note: If your current version is counting 31,000 out of 70,000 emails, then an index will not be much help for that. However, this will make the code more scalable for the future.
Upvotes: 2
Reputation: 425341
If you need your query to be really fast, you'll need to materialize it.
MySQL lacks a way to do that natively, so you'll have to create a table like that:
CREATE TABLE mails_host_day
(
host VARCHAR(30) NOT NULL,
day DATE NOT NULL,
mails BIGINT NOT NULL,
PRIMARY KEY (host, day)
)
and update it either in a trigger on postfix_mails
or with a script once in a while:
INSERT
INTO mails_host_day (host, day, mails)
SELECT host, CAST(queued_at AS DATE), COUNT(*)
FROM postfix_mails
WHERE id > :last_sync_id
GROUP BY
host, CAST(queued_at AS DATE)
ON DUPLICATE KEY
UPDATE mails = mails + VALUES(mails)
This way, querying a host-day entry is a single primary key seek.
Note that trigger-based solution will affect DML performance, while the script-based solution will result in slightly less actual data.
However, you can improve the script-based solution actuality if you union the most recent actual data with the stored results:
SELECT host, day, SUM(mails) AS mails
FROM (
SELECT host, day, mails
FROM mails_host_day
UNION ALL
SELECT host, CAST(queued_at) AS day, COUNT(*) AS mails
FROM postfix_mails
WHERE id >= :last_sync_id
GROUP BY
host, CAST(queued_at) AS day
) q
It's not a single index seek anymore, however, if you run the update script often enough, there will be less actual records to read.
Upvotes: 1
Reputation: 6683
You could use pagination to speed up queries in PHP which is usually how I resolve anything that contains a large amount of data - but this depends on your Table hierarchy.
Integrate your LIMIT
in the SQL query.
PHP:
foreach ($db->Prepare("SELECT COUNT(*) as `count`
FROM `postfix_mails`
WHERE DATEDIFF(`queued_at`, '2016-04-11') = 0)
AND mail_id < :limit "))->execute(array(':limit' => $_POST['limit'])) as $row)
{
// normal output
}
jQuery:
$(document).ready( function() {
var starting = 1;
$('#next').click( function() {
starting = starting + 10;
$.post('phpfilehere.php', { limit: starting })
.done( function(data) {
$('#mail-output').innerHTML = data;
});
);
);
Here, each page shows 10 emails on, of course you can change this and modify it and even add a search which I actually have an Object I use for all my Projects.
I just thought I'd share the idea - it also adds real-time data flow on your site too.
This was inspired to me by Facebook's scrolling show more - which really isn't hard but is such a good way for querying a lot of data.
Upvotes: 0
Reputation: 1210
You have a unique key on 'host', 'mail_id', and 'to', however when the query engine tries to use that index, you aren't filtering on 'mail_id' and 'to', so it may not be as efficient. A solution could be to add another index just on 'host' or add AND 'mail_id' IS NOT NULL AND'to' IS NOT NULL
to your query to fully utilize the existing unique index.
Upvotes: 0