Reputation: 207
I've got a problem with this query....or, not yet as much as I will soon. Currently the table 'marketingDatabase' is around 11k rows but within the next month, it will likely be near 100k rows and growing to possibly 500k by March.
I know using ORDER BY RAND() is not the way to go but it's the only thing that I've gotten to work. I've tried other things but the first WHERE statement seems to be throwing me off. I'm using PHP, so I could process some of this in PHP too.
What is the best way, with this query, to select a random row from rows that fit in the WHERE statement?
Here's the query:
SELECT id
FROM `marketingDatabase`
WHERE do_not_call != 'true'
AND status = 'Pending'
AND install_id = 'AN ID HERE'
AND NOT EXISTS(
SELECT recordID
FROM reminders rem
WHERE rem.id = marketingDatabase.id
)
ORDER BY rand()
LIMIT 1
Any thoughts on how to make that work better? I simply need a random 'id'.
Upvotes: 3
Views: 176
Reputation: 10341
First, see if we can optimise that query a little:
SELECT `m`.`id`
FROM `marketingDatabase` AS `m`
LEFT JOIN `reminders` AS `r` ON ( `r`.`id` = `m`.`id` )
WHERE
`m`.`do_not_call` != 'true'
AND `m`.`status` = 'Pending'
AND `m`.`install_id` = 'AN ID HERE'
AND `r`.`id` IS NULL
ORDER BY
rand()
LIMIT 1
NOTE: This is just an idea, and has not been tested in the wild.
Why not get a count of the possible number of records to find, and then use PHP to find a random row number from that count, then requery to find it.
$rowCount = 0;
$rowCountSql = "SELECT COUNT(*) AS `rowcount`
FROM `marketingDatabase` AS `m`
LEFT JOIN `reminders` AS `r` ON ( `r`.`id` = `m`.`id` )
WHERE
`m`.`do_not_call` != 'true'
AND `m`.`status` = 'Pending'
AND `m`.`install_id` = 'AN ID HERE'
AND `r`.`id` IS NULL";
if( $rowCountRes = mysql_query( $rowCountSql )
&& mysql_num_rows( $rowCountRes )
&& $r = mysql_fetch_assoc( $rowCountRes ) )
$rowCount = $r['rowcount'];
$oneRow = false;
$oneRowSql = "SELECT `m`.`id` AS `rowid`
FROM `marketingDatabase` AS `m`
LEFT JOIN `reminders` AS `r` ON ( `r`.`id` = `m`.`id` )
WHERE
`m`.`do_not_call` != 'true'
AND `m`.`status` = 'Pending'
AND `m`.`install_id` = 'AN ID HERE'
AND `r`.`id` IS NULL
LIMIT ".(int) $rowCount.", 1";
if( $oneRowRes = mysql_query( $rowCountSql )
&& mysql_num_rows( $oneRowRes )
&& $r = mysql_fetch_assoc( $oneRowRes ) )
$oneRow = $r['rowid'];
This may prove to have no performance benefits, but I just thought I would put it out there to see if any of my more learned colleagues could better it.
A further exploration of the above (which I would test, if I had access to your database...)
SELECT `m`.`id` AS `rowid`
FROM `marketingDatabase` AS `m`
LEFT JOIN `reminders` AS `r` ON ( `r`.`id` = `m`.`id` )
WHERE
`m`.`do_not_call` != 'true'
AND `m`.`status` = 'Pending'
AND `m`.`install_id` = 'AN ID HERE'
AND `r`.`id` IS NULL
LIMIT ( FLOOR( RAND( ) * (
SELECT COUNT(*) AS `rowcount`
FROM `marketingDatabase` AS `m`
LEFT JOIN `reminders` AS `r` ON ( `r`.`id` = `m`.`id` )
WHERE
`m`.`do_not_call` != 'true'
AND `m`.`status` = 'Pending'
AND `m`.`install_id` = 'AN ID HERE'
AND `r`.`id` IS NULL ) ) ) , 1
Just an idea...
Upvotes: 3
Reputation: 19979
We had the same issue which you are facing now, where the number of rows grew too high, the order by rand() was actually causing connections to hang and throwing white pages, so we had to come up with different solutions.
One option we took was to chunk out the ids in groups of 1000 and throw them into cache, then we would randomize which group we selected, then randomized which id out of the group to select.
We also wrote a sample of random ids to a flat-file once a day and read numbers from it, but I believe we moved away from that solution for the previous caching solution.
Just a few ideas.
Upvotes: 0