AB Moslih
AB Moslih

Reputation: 45

SQL query NOT EXIST very slow

I'm trying to optimize an SQL query as it is slow, and gets slower when the query result is high.

 SELECT * 
 FROM comments  
 WHERE 
     DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY) 
     AND comments.group_id = " . $group_id . " 
     AND comments.user_id != " . $user_id . "  
     AND NOT EXISTS ( 
         SELECT *
         FROM reads
         WHERE 
             comments.post_id = reads.notification_id 
             AND comments.group_id = reads.group_id  
             AND reads.user_id = " . $user_id . " 
             AND comments.nature1 = reads.notification_type
             AND comments.created_on < reads.read_date
     )  
LIMIT 8  

There are indexes concerned fields and Tables are quite big.

Upvotes: 0

Views: 230

Answers (4)

GMB
GMB

Reputation: 222402

As a starter, this condition:

DATE(created_on) > DATE_SUB(CURDATE(), INTERVAL 1 DAY) 

Should be rewritten as:

created_on >= current_date 

This is functionally equivalent, and not using date functions on the column being filtered gives the database a chance to use an index.

Then, consider the following indexes:

comments(group_id, created_on, user_id)
reads(notification_id, group_id, user_id, notification_type, created_on)

These are two multi-column indexes (called compound indexes), not individual indexes on each column. You'll notice that they match the where predicates of the query and subquery. The ordering of columns in the index is important (especially in the index on comments): you want columns that have equality predicates first, then columns with inequality predicates.

Finally: do you really need select *? It is better to reduce the list to the columns you actually need; if there is just a few of them, you might want to try and add them to the index on comments.

Side notes:

  • limit without order by is usually not useful. This gives you an arbitrary set of rows out of those that match - and the results may not be consistent over consecutive executions of the same query over the same dataset

  • consider using prepared statements rather than concatenating variables in the query string; this allows MySQL to recognize the query, and reuse an already prepared execution plan (that's a tiny gain, but always good to take) - and, more important, it prevents SQL injection.

Upvotes: 1

Khwarizmi
Khwarizmi

Reputation: 23

In addition to all the answers mentioned here, here's a general advice.

An EXPLAIN in front of a SELECTquery shows you how the query will be executed.

i.e. EXPLAIN SELECT * FROM T1

Now, the column key_len shows you the size in bytes for one index item. The lower this value is the more index items fit into the same memory size an they can be faster processed. Rows shows you the expected number of rows the query needs to scan, the lower the better.

Upvotes: 0

Jiř&#237; Baum
Jiř&#237; Baum

Reputation: 6930

  • Minor potential improvement, replace the clause AND comments.group_id = reads.group_id with AND reads.group_id = " . $group_id . "

    This makes no difference because comments.group_id is already always equal to $group_id and the constant may be easier for the database to match. However, the database may already be making this optimisation internally, or otherwise running the query in a way where it can't take advantage of this.

  • Major concern: do not construct queries using the . operator; instead, use the relevant function in your library or framework to separately pass the SQL query (with placeholders) and the values (group_id and user_id). Constructing queries using the . operator is incredibly dangerous.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269463

For your query, you want indexes on:

  • comments(group_id, created_on, user_id)
  • reads(post_id, group_id, notification_type, user_id, created_on).

Are these the two of the indexes that you have?

Upvotes: 0

Related Questions