Reputation: 2165
I want to improve the performance of the following query.
SELECT DISTINCT StringId
FROM translations
WHERE status = 1
AND TranslationId IN
(
SELECT Max(TranslationId)
FROM translations
WHERE languageId = 2
AND TranslationId > 0
GROUP BY StringId;
This query currently takes forever because In clause.
The following subquery
SELECT Max(TranslationId)
FROM translations
WHERE languageId = 2
AND TranslationId > 0
GROUP BY StringId;
returns 16,000 rows and IN clause with 16,000 rows is what is taking so long.
An improved query that I came up with is
SELECT DISTINCT t1.StringId
FROM translations t1
INNER JOIN
(
SELECT Max(TranslationId) MaxTranslationId,
StringId
FROM translations
WHERE languageId = 2
AND TranslationId > 0
GROUP BY StringId
) t2 ON t1.StringId = t2.StringId
AND t1.TranslationId = t2.MaxTranslationId
WHERE status = 1;
I hope the inner join would return only rows that match both StringId and Max(TranslationId) like just using In clause.
Can somebody give me an answer if this is a right way to approach?
Upvotes: 0
Views: 62
Reputation: 1271111
The following is an alternative approach that uses NOT EXISTS
:
SELECT DISTINCT t.StringId
FROM translations t
WHERE status = 1 AND
NOT EXISTS (select 1
from translations t2
where t2.languageId = 2 and
t2.StringId = t.StringId and
t2.translationId > t.translationId
) AND
EXISTS (select 1
from translations t2
where t2.languageId = 2 and
t2.StringId = t.StringId and
t2.translationId > 0
)
For performance, you want an index on translations(StringId, languageId, translationId)
. This eliminates one of the aggregations, replacing it with an index lookup, which very well might be faster.
EDIT: The above should work. But, this might also be relatively efficient:
SELECT DISTINCT StringId
FROM translations t
WHERE status = 1 AND
t.TranslationId = (SELECT Max(t2.TranslationId)
FROM translations t2
WHERE t2.languageId = 2 AND t2.TranslationId > 0 AND
t2.StringId = t.StringId
);
Upvotes: 2