Reputation: 11677
I'm trying to figure out how to optimize a very slow query in MySQL (I didn't design this):
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391';
+----------+
| COUNT(*) |
+----------+
| 3224022 |
+----------+
1 row in set (1 min 0.16 sec)
Comparing that to a full count:
select count(*) from change_event;
+----------+
| count(*) |
+----------+
| 6069102 |
+----------+
1 row in set (4.21 sec)
The explain statement doesn't help me here:
explain SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: me
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: NULL
rows: 4120213
Extra: Using where; Using index
1 row in set (0.00 sec)
OK, it still thinks it needs roughly 4 million entries to count, but I could count lines in a file faster than that! I don't understand why MySQL is taking this long.
Here's the table definition:
CREATE TABLE `change_event` (
`change_event_id` bigint(20) NOT NULL default '0',
`timestamp` datetime NOT NULL,
`change_type` enum('create','update','delete','noop') default NULL,
`changed_object_type` enum('Brand','Broadcast','Episode','OnDemand') NOT NULL,
`changed_object_id` varchar(255) default NULL,
`changed_object_modified` datetime NOT NULL default '1000-01-01 00:00:00',
`modified` datetime NOT NULL default '1000-01-01 00:00:00',
`created` datetime NOT NULL default '1000-01-01 00:00:00',
`pid` char(15) default NULL,
`episode_pid` char(15) default NULL,
`import_id` int(11) NOT NULL,
`status` enum('success','failure') NOT NULL,
`xml_diff` text,
`node_digest` char(32) default NULL,
PRIMARY KEY (`change_event_id`),
KEY `idx_change_events_changed_object_id` (`changed_object_id`),
KEY `idx_change_events_episode_pid` (`episode_pid`),
KEY `fk_import_id` (`import_id`),
KEY `idx_change_event_timestamp_ce_id` (`timestamp`,`change_event_id`),
KEY `idx_change_event_status` (`status`),
CONSTRAINT `fk_change_event_import` FOREIGN KEY (`import_id`) REFERENCES `import` (`import_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Version:
$ mysql --version
mysql Ver 14.12 Distrib 5.0.37, for pc-solaris2.8 (i386) using readline 5.0
Is there something obvious I'm missing? (Yes, I've already tried "SELECT COUNT(change_event_id)", but there's no performance difference).
Upvotes: 57
Views: 91986
Reputation: 847
To make the search more efficient, although I recommend adding index. I leave the command for you to try the metrics again
CREATE INDEX ixid_1 ON change_event (change_event_id);
and repeat query
SELECT COUNT(*) FROM change_event me WHERE change_event_id > '1212281603783391';
-JACR
Upvotes: 0
Reputation: 16735
Here are a few things I suggest:
Change the column from a "bigint" to an "int unsigned". Do you really ever expect to have more than 4.2 billion records in this table? If not, then you're wasting space (and time) the the extra-wide field. MySQL indexes are more efficient on smaller data types.
Run the "OPTIMIZE TABLE" command, and see whether your query is any faster afterward.
You might also consider partitioning your table according to the ID field, especially if older records (with lower ID values) become less relevant over time. A partitioned table can often execute aggregate queries faster than one huge, unpartitioned table.
EDIT:
Looking more closely at this table, it looks like a logging-style table, where rows are inserted but never modified.
If that's true, then you might not need all the transactional safety provided by the InnoDB storage engine, and you might be able to get away with switching to MyISAM, which is considerably more efficient on aggregate queries.
Upvotes: 16
Reputation: 482
MySQL does say "Using where" first, since it does need to read all records/values from the index data to actually count them. With InnoDb it also tries to "grab" that 4 mil record range to count it.
You may need to experiment with different transaction isolation levels: http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html#isolevel_read-uncommitted
and see which one is better.
With MyISAM it would be just fast, but with intensive write model will result in lock issues.
Upvotes: 2
Reputation: 48386
InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages. In order to do a range scan you still have to scan through all of the potentially wide rows in data pages; note that this table contains a TEXT column.
Two things I would try:
optimize table
. This will ensure that the data pages are physically stored in sorted order. This could conceivably speed up a range scan on a clustered primary key.(you also probably want to make the change_event_id column bigint unsigned if it's incrementing from zero)
Upvotes: 62
Reputation: 1344
Check to see how fragmented your indexes are. At my company we have a nightly import process that trashes our indexes and over time it can have a profound impact on data access speeds. For example we had a SQL procedure that took 2 hours to run one day after de-fragmenting the indexes it took 3 minutes. we use SQL Server 2005 ill look for a script that can check this on MySQL.
Update: Check out this link: http://dev.mysql.com/doc/refman/5.0/en/innodb-file-defragmenting.html
Upvotes: 4
Reputation: 17400
I would create a "counters" table and add "create row"/"delete row" triggers to the table you are counting. The triggers should increase/decrease count values on "counters" table on every insert/delete, so you won't need to compute them every time you need them.
You can also accomplish this on the application side by caching the counters but this will involve clearing the "counter cache" on every insertion/deletion.
For some reference take a look at this http://pure.rednoize.com/2007/04/03/mysql-performance-use-counter-tables/
Upvotes: -1
Reputation: 124277
I've run into behavior like this before with IP geolocation databases. Past some number of records, MySQL's ability to get any advantage from indexes for range-based queries apparently evaporates. With the geolocation DBs, we handled it by segmenting the data into chunks that were reasonable enough to allow the indexes to be used.
Upvotes: 6
Reputation: 339786
Run "analyze table_name
" on that table - it's possible that the indices are no longer optimal.
You can often tell this by running "show index from table_name
". If the cardinality value is NULL
then you need to force re-analysis.
Upvotes: 1