Optimizing the performance of MySQL regarding aggregation

Question

I'm trying to optimize a report query, as most of report queries this one incorporates aggregation. Since the size of table is considerable and growing, I need to tend to its performance.

For example, I have a table with three columns: id, name, action. And I would like to count the number of actions each name has done:

SELECT name, COUNT(id) AS count
FROM tbl
GROUP BY name;

As simple as it gets, I can't run it in a acceptable time. It might take 30 seconds and there's no index, whatsoever, I can add which is taken into account, nevertheless improves it.

When I run EXPLAIN on the above query, it never uses any of indices of the table, i.e. an index on name.

Is there any way to improve the performance of aggregation? Why the index is not used?

[UPDATE]

Here's the EXPLAIN's output:

+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra           |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
|  1 | SIMPLE      | tbl   | ALL  | NULL          | NULL | NULL    | NULL | 4025567 |   100.00 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+

And here is the table's schema:

CREATE TABLE `tbl` (
  `id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `name` varchar(1000) NOT NULL,
  `action` int unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `inx` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Marcus Adams · Accepted Answer

The problem with your query and use of indexes is that you refer to two different columns in your SELECT statement yet only have one column in your indexes, plus the use of a prefix on the index.

Try this (refer to just the name column):

SELECT name, COUNT(*) AS count
FROM tbl
GROUP BY name;

With the following index (no prefix):

tbl (name)

Don't use a prefix on the index for this query because if you do, MySQL won't be able to use it as a covering index (will still have to hit the table).

If you use the above, MySQL will scan through the index on the name column, but won't have to scan the actual table data. You should see USING INDEX in the explain result.

This is as fast as MySQL will be able to accomplish such a task. The alternative is to store the aggregate result separately and keep it updated as your data changes.

Also, consider reducing the size of the name column, especially if you're hitting index size limits, which you most likely are hence why you're using the prefix. Save some room by not using UTF8 if you don't need it (UTF8 is 3 bytes per character for index).

Optimizing the performance of MySQL regarding aggregation

Answers (2)

Related Questions