Esmay Kapol
Esmay Kapol

Reputation: 241

SQL statement is taking long

I've been running into this problem for quite a while now. I want to grab all the rows from table sentences, that are not present in another table (with some conditions).

I've already managed to make this work but the query is taking 2-3 seconds. I want to make this faster, but since I am not familiar with SQL I can't manage to get it working.

The particular query is:

/**
* Only find sentences that are allowed to say back.
* 1. WHERE NOT EXISTS sentences said by this bot said in the past 2 weeks
* 2. WHERE NOT EXISTS sentences said in the last 8 minutes
* 3. WHERE NOT EXISTS sentences said to this customer
* 4. WHERE EXIST sentences with [$translation] translation
*/
 select * from `sentences` where `keyword_id` = 396 
and not exists (select id from `customer_sentences` where sentences.id = customer_sentences.sentence_id and customer_sentences.bot_id = 1 and customer_sentences.created_at >= "2019-12-30 13:25:58") 
and not exists (select id from `customer_sentences` where sentences.id = customer_sentences.sentence_id and customer_sentences.created_at >= "2020-01-13 13:17:58") 
and not exists (select id from `customer_sentences` where sentences.id = customer_sentences.sentence_id and customer_sentences.customer_id = 153375) 
and exists (select id from `sentence_translations` where sentence_translations.sentence_id = sentences.id and sentence_translations.language_id = 1)

By removing every statement one by one, I've found out the below where not exist statement is slow:

and not exists (select id from `customer_sentences` where sentences.id = customer_sentences.sentence_id and customer_sentences.created_at >= "2020-01-13 13:17:58") 

I've managed to improve the speed from 30s to 2-3s by creating indexes on my customer_sentences table:

$table->index(['created_at', 'bot_id']);
$table->index(['bot_id']);
$table->index(['customer_id']);
$table->index(['bot_id', 'created_at', 'sentence_id']);

I read about using a left join instead of the where not exists statement, but I weren't be able to get this working.

Upvotes: 2

Views: 65

Answers (2)

Pan
Pan

Reputation: 341

Correlated subqueries can be inefficient.

You want to use LEFT JOIN together with the condition that some mandatory (NOT NULL) column in the right table IS NULL.

Try this:

SELECT
* 
FROM sentences AS s
LEFT JOIN customer_sentences AS cs ON s.id = cs.sentence_id AND cs.bot_id = a AND cs.created_at >= '2019-12-30 13:25:58'
LEFT JOIN customer_sentences AS cs2 ON s.id = cs2.sentence_id AND cs.created_at >= '2020-01-13 13:17:58'
LEFT JOIN customer_sentences AS cs3 ON s.id = cs3.sentence_id AND cs.customer_id = 153375
JOIN sentence_translations AS st ON s.id = st.sentence_id AND st.language_id = 1
WHERE cs.id IS NULL
AND cs2.id IS NULL
AND cs3.id IS NULL

I also noticed that your indexes are inefficient. Fixing those should have a bigger impact than optimizing the query. Your first subquery will use the compound index on ['bot_id', 'created_at', 'sentence_id']. The second subquery will only use the created_at part of ['created_at', 'bot_id']. Your third subquery will use the ['customer_id'] index. Your forth subquery will probably not use any index. Maybe it's a tiny table or you have indexes on that table as well, which you did not include in the question.

If you create a compound index in customer_sentences consisting of first sentence_id followed by created_at, it can be used by all subqueries (or joins) on that table. The subquery missing created_at will still be able to use the first part of this new index. It may be better to only index the sentence_id column. Your existing indexes should probably be removed. Indexes increase the time it takes to write to the table as each write will also need to update all indexes. They also increase the size on disk of your table as well as possibly memory consumption.

You should probably also create an index in sentence_translations on sentence_id.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269533

For this clause (which is equivalent to yours):

not exists (select 1
            from customer_sentences cs
            where sentences.id = cs.sentence_id and
                  cs.created_at >= '2020-01-13 13:17:58'
           ) 

You want an index on customer_sentences(sentence_id, created_at). The ordering is important, and you want both keys.

Upvotes: 1

Related Questions