Reputation: 2817
I have a MySQL SELECT query which uses 20 different comparisons within the same table. Here's an example:
SELECT * FROM mytable
WHERE (col1 > (col2 * 0.25))
AND (col5 < col10) .......
I'm trying to calculate percentile ranks based on the order of a column called SCORE
within the SELECT
results returned. I've tried using incremental row numbers and COUNT(*)
to get the stock's rank and total number of results returned but not sure how to assign the same rank where some of the results have the same SCORE
.
Here's the formula that I'm trying to calculate:
((COUNT(lower scores) + (COUNT(same/tied scores) / 2)) * 100) / COUNT(total results)
How do I find the number of lower scores, same/tied scores and total scores within the same result row for calculating percentiles on the fly?
I'm trying to avoid using stored procedures because I want to my application's admins to tailor the SELECT statement within my applications admin area as needed.
Upvotes: 2
Views: 4905
Reputation: 111
The responses from Shlomi and Zishan (which uses Shlomi's code) definitely do not give accurate results, as I discovered by examining the results on a large table of mine. As answered elsewhere, it is apparently impossible to calculate percentile ranks in a single MySQL query: SQL rank percentile
The Shlomi Noach approach using user-defined variables does - at first - look like it's working fine for the top couple percent of rankings, but it quickly degenerates for the lower-ranking rows in your table. Look at your data results for yourself, as I did.
See this blog post by Roland Bouman about why Shlomi's approach using user-defined variables within a single SQL statement doesn't work, with a proposed better solution:
http://rpbouman.blogspot.com/2009/09/mysql-another-ranking-trick.html
So then I adapted Bouman's code for this purpose and here's my solution, which necessarily combines PHP and MySQL:
Step 1) Calculate and store the absolute rank for each row by submitting the following two queries:
SET @@group_concat_max_len := @@max_allowed_packet;
UPDATE mytable INNER JOIN (SELECT ID, FIND_IN_SET(
score,
(SELECT GROUP_CONCAT(
DISTINCT score
ORDER BY score DESC
)
FROM mytable)
) AS rank
FROM mytable) AS a
ON mytable.ID=a.ID
SET mytable.rank = rank;
Step 2: Fetch the total number of rows (and store the result in a PHP variable $total)
SELECT COUNT(ID) FROM mytable
Step 3: Use a PHP loop to iterate through the table to use the absolute rank for each row to calculate the row's percentile rank:
3a) Loop through:
SELECT ID, rank FROM mytable
while storing those row values as $ID and $rank in PHP
3b) For each row run:
$sql = 'UPDATE mytable INNER JOIN (
SELECT (100*COUNT(ID)/'.$total.') percentile
FROM mytable
WHERE rank >= '.$rank.'
) a
ON mytable.ID = a.ID
WHERE mytable.ID='.$ID.'
SET mytable.percentile = a.percentile';
Probably not the most efficient process, but definitely accurate, and since in my case the 'score' value is not updated very often, so I run the above script as a cron batch operation to keep the percentile ranks up-to-date.
Upvotes: 1
Reputation: 2817
Using Shlomi's code above, here's the code that I came up with to calculate percentile ranks (in case anyone wants to calculate these in the future):
SELECT
c.id, c.score, ROUND(((@rank - rank) / @rank) * 100, 2) AS percentile_rank
FROM
(SELECT
*,
@prev:=@curr,
@curr:=a.score,
@rank:=IF(@prev = @curr, @rank, @rank + 1) AS rank
FROM
(SELECT id, score FROM mytable) AS a,
(SELECT @curr:= null, @prev:= null, @rank:= 0) AS b
ORDER BY score DESC) AS c;
Upvotes: 3
Reputation: 9394
Here's a post (of mine) which explains ranking during SELECT
: SQL: Rank without Self Join.
It uses user defined variables which are accessed and assigned even as the rows are being iterated.
Using the same logic, it could be extended to include numbers of total scores, distinct scores etc. As a preview, here's a typical query:
SELECT
score_id, student_name, score,
@prev := @curr,
@curr := score,
@rank := IF(@prev = @curr, @rank, @rank+1) AS rank
FROM
score,
(SELECT @curr := null, @prev := null, @rank := 0) sel1
ORDER BY score DESC
;
Upvotes: 2