Reputation: 45
I'm trying to optimize an SQL query as it is slow, and gets slower when the query result is high.
SELECT *
FROM comments
WHERE
DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
AND comments.group_id = " . $group_id . "
AND comments.user_id != " . $user_id . "
AND NOT EXISTS (
SELECT *
FROM reads
WHERE
comments.post_id = reads.notification_id
AND comments.group_id = reads.group_id
AND reads.user_id = " . $user_id . "
AND comments.nature1 = reads.notification_type
AND comments.created_on < reads.read_date
)
LIMIT 8
There are indexes concerned fields and Tables are quite big.
Upvotes: 0
Views: 230
Reputation: 222402
As a starter, this condition:
DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY)
Should be rewritten as:
created_on >= current_date
This is functionally equivalent, and not using date functions on the column being filtered gives the database a chance to use an index.
Then, consider the following indexes:
comments(group_id, created_on, user_id)
reads(notification_id, group_id, user_id, notification_type, created_on)
These are two multi-column indexes (called compound indexes), not individual indexes on each column. You'll notice that they match the where
predicates of the query and subquery. The ordering of columns in the index is important (especially in the index on comments
): you want columns that have equality predicates first, then columns with inequality predicates.
Finally: do you really need select *
? It is better to reduce the list to the columns you actually need; if there is just a few of them, you might want to try and add them to the index on comments
.
Side notes:
limit
without order by
is usually not useful. This gives you an arbitrary set of rows out of those that match - and the results may not be consistent over consecutive executions of the same query over the same dataset
consider using prepared statements rather than concatenating variables in the query string; this allows MySQL to recognize the query, and reuse an already prepared execution plan (that's a tiny gain, but always good to take) - and, more important, it prevents SQL injection.
Upvotes: 1
Reputation: 23
In addition to all the answers mentioned here, here's a general advice.
An EXPLAIN
in front of a SELECT
query shows you how the query will be executed.
i.e. EXPLAIN SELECT * FROM T1
Now, the column key_len shows you the size in bytes for one index item. The lower this value is the more index items fit into the same memory size an they can be faster processed. Rows shows you the expected number of rows the query needs to scan, the lower the better.
Upvotes: 0
Reputation: 6930
Minor potential improvement, replace the clause AND comments.group_id = reads.group_id
with AND reads.group_id = " . $group_id . "
This makes no difference because comments.group_id
is already always equal to $group_id
and the constant may be easier for the database to match. However, the database may already be making this optimisation internally, or otherwise running the query in a way where it can't take advantage of this.
Major concern: do not construct queries using the .
operator; instead, use the relevant function in your library or framework to separately pass the SQL query (with placeholders) and the values (group_id
and user_id
). Constructing queries using the .
operator is incredibly dangerous.
Upvotes: 0
Reputation: 1269463
For your query, you want indexes on:
comments(group_id, created_on, user_id)
reads(post_id, group_id, notification_type, user_id, created_on)
.Are these the two of the indexes that you have?
Upvotes: 0