Reputation: 16861
I'm trying to optimize a report query, as most of report queries this one incorporates aggregation. Since the size of table is considerable and growing, I need to tend to its performance.
For example, I have a table with three columns: id
, name
, action
. And I would like to count the number of actions each name has done:
SELECT name, COUNT(id) AS count
FROM tbl
GROUP BY name;
As simple as it gets, I can't run it in a acceptable time. It might take 30 seconds and there's no index, whatsoever, I can add which is taken into account, nevertheless improves it.
When I run EXPLAIN
on the above query, it never uses any of indices of the table, i.e. an index on name
.
Is there any way to improve the performance of aggregation? Why the index is not used?
[UPDATE]
Here's the EXPLAIN
's output:
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
| 1 | SIMPLE | tbl | ALL | NULL | NULL | NULL | NULL | 4025567 | 100.00 | Using temporary |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------+-----------------+
And here is the table's schema:
CREATE TABLE `tbl` (
`id` bigint(20) unsigned NOT NULL DEFAULT '0',
`name` varchar(1000) NOT NULL,
`action` int unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `inx` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Upvotes: 1
Views: 49
Reputation: 53870
The problem with your query and use of indexes is that you refer to two different columns in your SELECT
statement yet only have one column in your indexes, plus the use of a prefix on the index.
Try this (refer to just the name column):
SELECT name, COUNT(*) AS count
FROM tbl
GROUP BY name;
With the following index (no prefix):
tbl (name)
Don't use a prefix on the index for this query because if you do, MySQL won't be able to use it as a covering index (will still have to hit the table).
If you use the above, MySQL will scan through the index on the name
column, but won't have to scan the actual table data. You should see USING INDEX
in the explain result.
This is as fast as MySQL will be able to accomplish such a task. The alternative is to store the aggregate result separately and keep it updated as your data changes.
Also, consider reducing the size of the name
column, especially if you're hitting index size limits, which you most likely are hence why you're using the prefix. Save some room by not using UTF8 if you don't need it (UTF8 is 3 bytes per character for index).
Upvotes: 2
Reputation: 37365
It's a very common question and key for solution lies in fact, that your table is growing.
So, first way would be: to create index by name
column if it isn't created yet. But: this will solve your issue for a time.
More proper approach would be: to create separate statistics table like
tbl_counts +------+-------+ | name | count | +------+-------+
And store your counts separately. When changing (insert/update or delete) your data in tbl
table - you'll need to adjust corresponding row inside tbl_counts
table. This way allows you to get rid of performing COUNT
query at all - but will need to add some logic inside tbl
table.
To maintain integrity of your statistics table you can either use triggers or do that inside application. This method is good if performance of COUNT
query is much more important for you than your data changing queries (but overhead from changing tbl_counts
table won't be too high)
Upvotes: 1