CoderSpinoza
CoderSpinoza

Reputation: 2165

Improving performance of MySQL query with a nested subquery with an IN clause

I want to improve the performance of the following query.

SELECT  DISTINCT StringId 
FROM    translations 
WHERE   status = 1  
AND     TranslationId IN 
( 
    SELECT  Max(TranslationId) 
    FROM    translations  
    WHERE   languageId = 2 
    AND     TranslationId > 0  
    GROUP BY StringId;

This query currently takes forever because In clause.

The following subquery

SELECT  Max(TranslationId) 
FROM    translations  
WHERE   languageId = 2 
AND     TranslationId > 0  
GROUP BY StringId;

returns 16,000 rows and IN clause with 16,000 rows is what is taking so long.

An improved query that I came up with is

SELECT  DISTINCT t1.StringId 
FROM    translations t1 
INNER JOIN 
(   
    SELECT  Max(TranslationId) MaxTranslationId, 
            StringId 
    FROM    translations  
    WHERE   languageId = 2 
    AND     TranslationId > 0  
    GROUP BY StringId
) t2    ON  t1.StringId = t2.StringId 
        AND t1.TranslationId = t2.MaxTranslationId 
WHERE   status = 1;

I hope the inner join would return only rows that match both StringId and Max(TranslationId) like just using In clause.

Can somebody give me an answer if this is a right way to approach?

Upvotes: 0

Views: 62

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271111

The following is an alternative approach that uses NOT EXISTS:

SELECT DISTINCT t.StringId 
FROM translations t
WHERE status = 1 AND
      NOT EXISTS (select 1
                  from translations t2
                  where t2.languageId = 2 and
                        t2.StringId = t.StringId and
                        t2.translationId > t.translationId
                 ) AND
      EXISTS (select 1
              from translations t2
              where t2.languageId = 2 and
                    t2.StringId = t.StringId and
                    t2.translationId > 0
                 )

For performance, you want an index on translations(StringId, languageId, translationId). This eliminates one of the aggregations, replacing it with an index lookup, which very well might be faster.

EDIT: The above should work. But, this might also be relatively efficient:

SELECT  DISTINCT StringId 
FROM    translations t
WHERE   status = 1 AND
        t.TranslationId = (SELECT  Max(t2.TranslationId) 
                           FROM    translations t2
                           WHERE   t2.languageId = 2 AND t2.TranslationId > 0 AND
                                   t2.StringId = t.StringId
                          );

Upvotes: 2

Related Questions