VP.
VP.

Reputation: 5141

Mysql Query Optimization

I have the following SQL query:

select expr1, operator, expr2, count(*) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;

Problem: The count(*) as part of my order by is killing my application, probably because it don't use index. I would like to know if there is any way to make it faster, like for example a select inside of another select, or something like that.

My SELECT explained:

+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
| id | select_type | table                | type  | possible_keys | key   | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | log_keyword_fulltext | index | NULL          | expr1 | 208     | NULL | 110000 | Using index; Using temporary; Using filesort | 
+----+-------------+----------------------+-------+---------------+-------+---------+------+--------+----------------------------------------------+

UPDATED:

I tried to do a subquery like that

select * from (select b.expr1,b.operator,b.expr2,count(*) as c 
from log_keyword_fulltext b group by b.expr1,b.operator,b.expr2) x 
order by x.c desc limit 2000;

its working but not faster, following is the explain:

+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
| id | select_type | table      | type  | possible_keys | key   | key_len | ref  | rows   | Extra          |
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL  | NULL    | NULL |  38398 | Using filesort | 
|  2 | DERIVED     | b          | index | NULL          | expr1 | 208     | NULL | 110000 | Using index    | 
+----+-------------+------------+-------+---------------+-------+---------+------+--------+----------------+

You can check that now, its not using temporary anymore, but it still with the same performance. any recommendation ?

Upvotes: 0

Views: 917

Answers (6)

Chris
Chris

Reputation: 317

The best way to prevent a table scan would be to add a cover index for those fields you regularly access. There is a one-time cost to create the index. There is also some additional cost for INSERT and DELETE operations on the table so the index can be updated.

Cover indexes prevent the database from having to read the entire record into memory in order to access the values for the few fields you care about. The entire query can be run off the index.

ALTER TABLE `log_keyword_fulltext` ADD INDEX `idx_name`(expr1, operator, expr2)

If these are not actual fields, but rather operations on the fields, such as left(foo,20), you can actually index the part of the field that you will use in future SELECT or WHERE clauses.

See this page for other optimization ideas.

Upvotes: 0

dkretz
dkretz

Reputation: 37645

What am I missing? I don't see a WHERE clause. It looks to me you're requesting a table scan.

If you are counting on your "LIMIT" clause, you're out of luck - that's the COUNT aggregate calculation.

Upvotes: 1

Paul Tomblin
Paul Tomblin

Reputation: 182772

Trying to count and sort by it is going to be a killer. I would suggest trying to make a temporary table with the counts, and then do a select...order by on that.

Not sure if this works in MySQL, but in PostreSQL or Oracle, that would be

create foo as 
   select expr1, operator, expr2, count(*) as c
   from log_keyword_fulltext 
   group by expr1, operator, expr2;
select * from foo order by c desc limit 2000;

Also, you're going to have to do all the counts in order to sort them, so the limit clause isn't going to prevent it from doing all those calculations.

Upvotes: -1

MarkR
MarkR

Reputation: 63538

You are running a query which needs to scan the whole of a table, this does not scale. There is no WHERE clause, so it absolutely needs to scan the whole thing.

Consider maintaining some summary table(s) instead of doing this query often.

Upvotes: 2

niXar
niXar

Reputation: 680

What do you mean by "killing your application"? What is the context? How often do you run this query? What is going on on the database while you're running this query? Does this particular result have to be real-time? What are the conditions (inserts / s, selects / s, db size, etc)

Here's what you could do:

  1. Store the count in a separate table, which you'd update with triggers on insert/delete

  2. If you can't coerce MySQL into doing this with a simple table swipe with some magic, try a stored procedure to do something like (pseudo code):

    CREATE TEMP TABLE t (e1 EXP_T, op OP_T, e2 EXP_T, count INTEGER)
    ADD AN INDEX ON count
    FOR EACH LINE OF SELECT exp1,operator,exp2 FROM log_blah DO
           UPDATE t SET count=count+1 WHERE exp1=e1 AND operator=op AND exp2=e2
           IF IT DOES NOT WORK INSERT INTO t VALUES (exp1,operator,exp2,1)
    DONE
    SELECT * FROM t ORDER BY count DESC LIMIT 2000
    

1 is probably what you want. And forget indices, this query has to swipe the whole table anyway.

Upvotes: 0

Samiksha
Samiksha

Reputation: 6182

Always try taking a count of some single column instead of taking count(*) as it takes a count in permutaiotion of each column of each row. So it takes longer time

Eg:

select expr1, operator, expr2, count(expr1) as c 
from log_keyword_fulltext 
group by expr1, operator, expr2 
order by c desc limit 2000;

Upvotes: 1

Related Questions