sir_gelato
sir_gelato

Reputation: 309

Trying to optimize a query that is crippling performance

Having terrible problems with the below query crippling performance of a script, with it often taking 10-30 seconds to complete. Wondering if anyone might have any optimization advice, specific or general is fine--I'm no querysmith.

Tinkering with types and indexes as well as the query itself is certainly doable.

SELECT DISTINCT t1.column_1, t1.column_2
FROM TABLE_1 AS t1
LEFT JOIN TABLE_1 AS t2
    ON t1.column_1 = t2.column_1
    AND t1.column_3 = t2.column_3
    AND t2.int_value = 1
    AND t2.column_4 = 'test_string_1'
WHERE t1.column_5 = 'text_string_2';

Size of TABLE_1 ~ 6 million rows

Structure of TABLE_1:

+--------------+--------------+------+-----+-------------------+-----------------------------+
| Field        | Type         | Null | Key | Default           | Extra                       |
+--------------+--------------+------+-----+-------------------+-----------------------------+
| id           | int(11)      | NO   | PRI | NULL              | auto_increment              |
| column_1     | bigint(12)   | YES  | MUL | NULL              |                             |
| column_4     | varchar(100) | YES  | MUL | NULL              |                             |
| column_5     | varchar(140) | YES  |     | NULL              |                             |
| column_2     | varchar(15)  | YES  | MUL | NULL              |                             |
| int_value    | int(1)       | YES  | MUL | NULL              |                             |
| last_updated | timestamp    | NO   | MUL | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+--------------+------+-----+-------------------+-----------------------------+

Upvotes: 0

Views: 50

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

For this query, you want the correct indexes:

SELECT DISTINCT t1.column_1, t1.column_2
FROM TABLE_1 AS t1 LEFT JOIN
     TABLE_1 AS t2
     ON t1.column_1 = t2.column_1 AND
        t1.column_3 = t2.column_3 AND
        t2.int_value = 1 AND
        t2.column_4 = 'test_string_1'
WHERE t1.column_5 = 'text_string_2';

As written, this would be: TABLE_1(column_5, column_1, column3, column_2) and TABLE_2(column_1, column_3, int_value, column_4).

However, I think the query can be greatly simplified. A LEFT JOIN keeps all rows in the first table, regardless of whether or not the conditions match. The WHERE condition is only on the first table and the columns are only from the first table, so the query should be equivalent to:

SELECT DISTINCT t1.column_1, t1.column_2
FROM TABLE_1 AS t1 
WHERE t1.column_5 = 'text_string_2';

It is possible that the DISTINCT is not necessary. But the best index for this simplified version is TABLE_1(column_5, column_1, column_2).

Note: If you made a mistake in writing the query in the question, please ask another question instead of invalidating this answer.

Upvotes: 1

sqlerty
sqlerty

Reputation: 27

Get rid of the DISTINCT and try the HAVING statement: maybe this will be faster:

SELECT t1.column_1, t1.column_2
FROM TABLE_1 AS t1
   LEFT JOIN TABLE_1 AS t2
   ON t1.column_1 = t2.column_1
   AND t1.column_3 = t2.column_3

HAVING t1.column_5 = 'text_string_2' AND t2.column_4 = 'test_string_1'   AND t2.int_value = 1  ;

Upvotes: 0

Related Questions