Kevin Vermaat
Kevin Vermaat

Reputation: 102

(PHP) MySQL random rows big table with order by and certain range

I have this table:

person_id int(10) pk
fid bigint(20) unique
points int(6) index
birthday date index
4 FK columns int(6)
ENGINE = MyISAM

Important info: the table contains over 8 million rows and is fast growing (1.5M a day at the moment)

What I want: to select 4 random rows in a certain range when I order the table on points

How I do it now: In PHP I randomize a certain range, let's say this gives me 20% as low range and 30% as high range. Next I count(*) the number of rows in table. After I determine the lowest row number: table count / 100 * low range. Same for high range. After I calculate a random row by using rand(lowest_row, highest_row), which gives me a row number within the range. And at last I select the random row by doing:

SELECT * FROM `persons` WHERE points > 0 ORDER BY points desc LIMIT $random_offset, 1;

The points > 0 is in the query since I only want randoms with at least 1 point.

Above query takes about 1.5 seconds to run, but since I need 4 rows it takes over 6 seconds, which is too slow for me. I figured the order by points takes the most time, so I was thinking about making a VIEW of the table, but I have really no experience with views, so what do you think? Is a view a good option or are there better solutions?

ADDED:

I forgot to say that it is important that all rows has the same chance of being selected.

Thanks, I appreciate all the help! :)

Kevin

Upvotes: 2

Views: 672

Answers (2)

Mike Brant
Mike Brant

Reputation: 71384

Views aren't going to do anything to help your performance here. My suggestion would be to simply run:

SELECT * FROM `persons` WHERE points BETWEEN ? AND ?

Make sure you have an index on points. Also, you SHOULD replace * with only the fields you are concerned about if applicable. Here is course ? represents the upper and lower bounds for your search.

You can then determine the number of rows returned in the result set using mysqli_num_rows() (or similar based on your DB library of choice).

You now have the total number of rows that meet your criteria. You can easily then calculate 4 random numbers within the range of results and use mysqli_data_seek() or similar to go directly to the record at the random offset and get the values you want from it.

Putting it all together:

$result = mysqli_query($db_conn, $sql); // here $sql is your SQL query
$num_records = 4; // your number of records to return
$num_rows = mysqli_num_rows($result);
$rows = array();

while ($i = 0; $i < $num_records; $i++) {
   $random_offset = rand(0, $num_rows - 1);
   mysqli_data_seek($result, $random_offset);
   $rows[] = mysqli_fetch_object($result);
}

mysqli_free_result($result);

Upvotes: 1

Niels Keurentjes
Niels Keurentjes

Reputation: 41958

Your query is so slow, and will become exponentially slower, because using LIMIT here forces it to do a full table sort, and then a full table scan, to get the result. Instead you should do this on the PHP end of things as well (this kind of 'abuse' of LIMIT is actually the reason it's non-standard SQL and for example MSSQL and Oracle do not support it).

First ensure there's an index on points. This will make select max(points), min(points) from persons a query that'll return instantly. Next you can determine from those 2 results the points range, and use rand() to determine 4 points in the requested range. Then repeat for each result:

SELECT * FROM persons WHERE points < $myValue ORDER BY points DESC LIMIT 1

Since it only has to retrieve one row, and can determine which one via the index, this'll be in the milliseconds execution time as well.

Upvotes: 1

Related Questions