Joseph Derblich
Joseph Derblich

Reputation: 43

Why does MySQL 5.7 order by affect temporary rank column?

I need to generate a list of employees ranked by percent of closings. I built the following stored procedure:

SET @cnt = 0;
SET @percent = 2.0;

SELECT
    CASE
        WHEN stats.close/(stats.open+stats.close) = @percent THEN @cnt
        ELSE (@cnt := @cnt + 1) 
    END rank,
    stats.employee,
    stats.close,
    stats.open,
    (@percent := stats.close/(stats.open+stats.close)) percent
         FROM stats
              WHERE stats.date = CURDATE()
                   ORDER BY percent

It Returns

| Rank | Employee | Close | Open | Percent|  
|     1|    Smith|       9|     1|    0.90|  
|     2|    Jones|      75|    25|    0.75|  
|     3|      Zed|       1|     9|    0.10|  
|     3|    Adams|      10|    90|    0.10|

This query works for its intended purpose but after reviewing the query it looks like it should not return correctly. Here is why I think so:

MySQL processes the SELECT prior to processing the ORDER BY. I would thereby assume that MySQL would assign the ranks in whatever order it decided to come out of the database and then sort the result set after. I would expect it to look like this:

| Rank | Employee | Close | Open | Percent|  
|     3|    Smith|       9|     1|    0.90|  
|     2|    Jones|      75|    25|    0.75|  
|     4|      Zed|       1|     9|    0.10|  
|     1|    Adams|      10|    90|    0.10|

Why is this not the case?

Upvotes: 1

Views: 810

Answers (1)

Paul Maxwell
Paul Maxwell

Reputation: 35603

Using @variables in any order isn't strictly speaking reliable, but it is a common "hack" to mimic window functions which MySQL still lacks (planned for v8.x i.e. still in pre-release).

This "hack" relies on the ORDER BY being processed together with the SELECT (not as 2 distinct steps) they are more integrated than you have expected. e.g.

SELECT * 
FROM   (SELECT CASE 
                 WHEN stats.close / ( stats.open + stats.close ) = @percent THEN 
                 @cnt 
                 ELSE ( @cnt := @cnt + 1 ) 
               end                                                        rank, 
               stats.employee, 
               stats.close, 
               stats.open, 
               ( @percent := stats.close / ( stats.open + stats.close ) ) 
               percent 
        FROM   stats 
        CROSS JOIN (@percent := 0 x, @cnt :=0 y) vars
        WHERE  stats.date = Curdate() 
        ORDER  BY percent ASC) d 
ORDER  BY percent DESC 

The inner order percent ASC will set ranks beginning at the lowest percent value, then the outer order will put the highest percent first.

The real problem is that the left to right (or top to bottom as seen above) sequence of expressions within the select clause isn't guaranteed. So it is possible that the comparison of @percent to calculate @cnt is not done in the manner shown in the written sql. In practice it works "most of the time", but might not. (So bring on the window functions!!)

Upvotes: 1

Related Questions