that0n3guy
that0n3guy

Reputation: 577

mysql for percentage between rows

I have some sql that looks like this:

SELECT 
stageName,
count(*) as `count`

FROM x2production.contact_stages
WHERE FROM_UNIXTIME(createDate) between  '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY) 
    AND (stageName = 'DI-Whatever' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting')
Group by stageName
Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Whatever')

This produces a table that looks like:

+-------------+-------+
|  stageName  | count |
+-------------+-------+
| DI-quote    |  1230 |
| DI-Meeting  |   985 |
| DI-Whatever |   325 |
+-------------+-------+

Question:

I would like a percentage from one row to the next. For example the percentage of DI-Meeting to DI-quote. The math would be 100*985/1230 = 80.0%

So in the end the table would look like so:

+-------------+-------+------+
|  stageName  | count | perc |
+-------------+-------+------+
| DI-quote    |  1230 | 0    |
| DI-Meeting  |   985 | 80.0 |
| DI-Whatever |   325 | 32.9 |
+-------------+-------+------+

Is there any way to do this in mysql?

Here is an SQL fiddle to mess w/ the data: http://sqlfiddle.com/#!9/61398/1

Upvotes: 2

Views: 351

Answers (2)

Drew
Drew

Reputation: 24959

The query

select stageName,count,if(rownum=1,0,round(count/toDivideBy*100,3)) as percent 
from 
(   select stageName,count,greatest(@rn:=@rn+1,0) as rownum, 
    coalesce(if(@rn=1,count,@prev),null) as toDivideBy,
    @prev:=count as dummy2 
    from 
    (   SELECT  
        stageName, 
        count(*) as `count` 
        FROM Table1 
        WHERE FROM_UNIXTIME(createDate) between  '2016-05-01' AND DATE_ADD('2016-08-31', INTERVAL 1 DAY)  
            AND (stageName = 'DI-Underwriting' OR stageName = 'DI-Quote' or stageName = 'DI-Meeting') 
        Group by stageName 
        Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting') 
    ) xDerived1 
    cross join (select @rn:=0,@prev:=-1) as xParams1 
) xDerived2; 

Results

+-----------------+-------+---------+
| stageName       | count | percent |
+-----------------+-------+---------+
| DI-Quote        |    16 |       0 |
| DI-Meeting      |    13 |  81.250 |
| DI-Underwriting |     4 |  30.769 |
+-----------------+-------+---------+

Note, you want a 0 as the percent for the first row. That is easily changed to 100.

The cross join brings in the variables for use and initializes them. The greatest and coalesce are used for safety in variable use as spelled out well in this article, and clues from the MySQL Manual Page Operator Precedence. The derived tables names are just that: every derived table needs a name.

If you do not adhere to the principles in those referenced articles, then the use of variables is unsafe. I am not saying I nailed it, but that safety is always my focus.

The assignment of variables need to follow a safe form, such as the @rn variable being set on the inside of a function like greatest or least. We know that @rn is always greater than 0. So we are using the greatest function to force our will on the query. Same trick with coalesce, null will never happen, and := has lower precedence in the column that follows it. That is, the last one: @prev:= which follows the coalesce.

That way, a variable is set before other columns in that select row attempt to use its value.

So, just getting the expected results does not mean you did it safely and that it will work with your real data.

Upvotes: 2

Jorge Campos
Jorge Campos

Reputation: 23371

What you need is to use a LAG function, since MySQL doesn't support it your have to mimic it this way:

select stageName,
       cnt,
       IF(valBefore is null,0,((100*cnt)/valBefore)) as perc
  from (SELECT tb.stageName, 
               tb.cnt,
               @ct AS valBefore,
              (@ct := cnt) 
         FROM (SELECT stageName,
                      count(*) as cnt
                 FROM Table1,
                      (SELECT @_stage = NULL,
                              @ct := NULL) vars
                WHERE FROM_UNIXTIME(createDate) between  '2016-05-01' 
                                                AND DATE_ADD('2016-08-31', INTERVAL 1 DAY) 
                  AND stageName in ('DI-Underwriting', 'DI-Quote', 'DI-Meeting')
                Group by stageName
                Order by field(stageName, 'DI-Quote', 'DI-Meeting', 'DI-Underwriting')                 
              ) tb
        WHERE (CASE WHEN @_stage IS NULL OR @_stage <> tb.stageName
                    THEN @ct := NULL 
                    ELSE NULL END IS NULL)  
       ) as final

See it working here: http://sqlfiddle.com/#!9/61398/35

EDIT I've actually edited it to remove an unnecessary step (subquery)

Upvotes: 1

Related Questions