SQL Nested Query is slow

Question

I have a database where users are represented by hashes. Each user (hash) has associated values (pertaining to the music track that he was listening to). Since, a user can listen to more than one track there are repeat instances of the user and associated data (with data being different as it is now a different track).

What I would like to do is select ~10 users at random from this database, and then find their associated data.

Currently, the code that I am using is this:

SELECT *
FROM `tblPlayLogV4`
WHERE `titleId` <> 0 AND `hash` IN (SELECT `hash` FROM `tblPlayLogV4` WHERE `titleId` <> 0 AND RAND() <= 0.1 GROUP BY `hash` HAVING COUNT(`hash`) > 500);

Why RAND() - because LIMIT is not allowed in inner queries. Idea for RAND() here - http://www.rndblog.com/how-to-select-random-rows-in-mysql/

The above query takes ages to complete.

If however, I run the inner query separately, it finishes in 4.53s. I then hard code the result of the inner query in the outer query, and that finishes in about 275 ms. The separated queries are presented below:

SELECT `hash` FROM `tblPlayLogV4` WHERE `titleId` <> 0 AND RAND() <= 0.1 GROUP BY `hash` HAVING COUNT(`hash`) > 500);

SELECT * FROM `tblPlayLogV4` WHERE `hash` IN ('-29e291921cccd06a5813bca17b7f7c3','-2c08232108dcd93c443d821165c2c79','-58285c1602072da713e51cc6cdc6313','-5bcc2c42482d5805277609a84474aef','-79ecab520d661a1d624de7e7b04f265','-e937c753a96fc9e441f83af97b08489','04d3f1e91e4e92970819190346405d2d','3f9f0cd502de38d47e39367cdfdd6722') AND `titleId`<>0;

Can someone please explain to me why this is happening? What is it that I am doing wrong? And if there is a better way for me to formulate my query, do tell me.

Number of entries in the database: 6,322,605

user359040 · Accepted Answer

As Dems said, your existing query is executing your RAND() selection in the sub-query for every record in your main query's table.

So try rewriting your main query like this:

SELECT f.*
FROM (SELECT `hash` 
      FROM `tblPlayLogV4` 
      WHERE `titleId` <> 0 AND RAND() <= 0.1 
      GROUP BY `hash` 
      HAVING COUNT(`hash`) > 500) r
JOIN `tblPlayLogV4` f on r.`hash` = f.`hash` and f.`titleId` <> 0;

SQL Nested Query is slow

Answers (2)

Related Questions