Reputation: 382102
I'm using the following query to extract the frequent short values from a mediumblob
column :
select bytes, count(*) as n
from pr_value
where bytes is not null && length(bytes)<11 and variable_id=5783
group by bytes order by n desc limit 10;
The problem I have is that this query takes too much time (about 10 seconds with less than 1 million records) :
mysql> select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+-------+----+
| bytes | n |
+-------+----+
| 32 | 21 |
| 27 | 20 |
| 52 | 20 |
| 23 | 19 |
| 25 | 19 |
| 26 | 19 |
| 28 | 19 |
| 29 | 19 |
| 30 | 19 |
| 31 | 19 |
+-------+----+
The table is as follows (unrelated columns not shown) :
mysql> describe pr_value;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| product_id | int(11) | NO | PRI | NULL | |
| variable_id | int(11) | NO | PRI | NULL | |
| author_id | int(11) | NO | PRI | NULL | |
| bytes | mediumblob | YES | MUL | NULL | |
+-------------+---------------+------+-----+---------+-------+
The type is mediumblob because most values are big. Less than 10% are short as the ones I'm looking for with this specific query.
I have the following indexes :
mysql> show index from pr_value;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| pr_value | 0 | PRIMARY | 1 | product_id | A | 8961 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 0 | PRIMARY | 3 | author_id | A | 842402 | NULL | NULL | | BTREE | | |
| pr_value | 1 | bytes | 1 | bytes | A | 842402 | 10 | NULL | YES | BTREE | | |
| pr_value | 1 | bytes | 2 | variable_id | A | 842402 | NULL | NULL | | BTREE | | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
MySQL explains my query like this :
mysql> explain select bytes, count(*) as n from pr_value where bytes is not null && length(bytes)<11 and variable_id=5783 group by bytes order by n desc limit 10;
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| 1 | SIMPLE | pr_value | range | bytes | bytes | 13 | NULL | 421201 | Using where; Using temporary; Using filesort |
+----+-------------+----------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
Note that the condition on the length of the bytes column can be removed without changing the duration.
What can I do to make this query fast ?
Of course i'd prefer not to have to add columns.
Upvotes: 1
Views: 1116
Reputation: 1052
your index on (bytes, variable_id) is not very smart. If you always have a variable_id clause in your queries you should add index with variable_id first :
(variable_id, bytes)
It depend on how discriminant variable_id is. But it sould help.
Another tip is to add a new indexed column with the result of "length(bytes)<11" :
update pr_value set small = length(bytes)<11;
Add a new index with (small,variable_id).
Upvotes: 2
Reputation: 2086
Why are you GROUP BY'ing the blob column? I'd imagine that's the bottleneck as then the Query actually does a compare against all blob columns to each other. Is it because you want unique values for the BLOB? I think the DISTINCT keyword might perform better than the GROUP BY.
Upvotes: 1