jampez77
jampez77

Reputation: 5241

Need help reducing very large MySQL query

I'm working on a project for managing submissions of Key Performance Indicators (KPIs) for onsite projects. Each submission has Performance Measures (PMs) attached to it, these are decided when the submission is enter on site.

The PMs are given a score by the user depending on the level of completion. I am now working on producing a report using all this data. I have a query that works but it is very large and cumbersome. I'm trying to find a way to streamline it.

I have a query that produces a list of KPIs. With this list I then apply some sub queries and simple maths to produce a 'score' for that period.

There are 13 periods. The scores are calculated based on the performance measure socres from completed submissions for the current year. There are two types of submissions 'panel' and 'programme'.

A score needs to be calculated for both submission types for each period. The average of these scores are then taken and the final score is calculated.

The final score is called the pain/gain indicator. we take the calculated score, divide it by 100 and then multiply by 0.3. If the number is less than one then we subtract one from pain/gain, respectively if the number is more than 0 we add 1 to it.

Period 1 is different as we only need to calculate the programme submissions, see below:

PERIOD 1

(
    SELECT
    ROUND(AVG(sp.progress), 2)
    FROM `submissions` AS sub
    LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
    LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
    WHERE sub.programme = 'programme'
    AND sub.submission_year = '2015/2016' //current year
    AND sub.state = '2' //state 2 is complete
    AND sub.period = '1'
    AND pm.kpi_id = kpi.id
) AS p1

The other periods however need the score based on both submission types:

PERIOD 2 - 12

CASE 
WHEN

( 
      IFNULL((
              SELECT
              ROUND(AVG(sp.progress), 2)
              FROM `submissions` AS sub
              LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
              LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
              WHERE sub.period = '2'
              AND sub.submission_year = '2015/2016'
              AND sub.state = '2'
              AND sub.panel = 'panel'
              AND pm.kpi_id = kpi.id
             ), 0) + 
      IFNULL((
              SELECT
              ROUND(AVG(sp.progress), 2)
              FROM `submissions` AS sub
              LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
              LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
              WHERE sub.programme = 'programme'
              AND sub.submission_year = '2015/2016'
              AND sub.state = '2'
              AND sub.period = '2'
              AND pm.kpi_id = kpi.id
            ), 0)/2) / 100 * 0.3 < 0 THEN 

       1- ROUND(( 
               IFNULL((
                        SELECT
                        ROUND(AVG(sp.progress), 2)
                        FROM `submissions` AS sub
                        LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                        LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                        WHERE sub.period = '2'
                        AND sub.submission_year = '2015/2016'
                        AND sub.state = '2'
                        AND sub.panel = 'panel'
                        AND pm.kpi_id = kpi.id
                      ), 0) + 
                IFNULL((
                         SELECT
                         ROUND(AVG(sp.progress), 2)
                         FROM `submissions` AS sub
                         LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                         LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                         WHERE sub.programme = 'programme'
                         AND sub.submission_year = '2015/2016'
                         AND sub.state = '2'
                         AND sub.period = '2'
                         AND pm.kpi_id = kpi.id
                        ), 0)/2) / 100 * 0.3, 2)
        WHEN 

               ( 
                IFNULL((
                        SELECT
                        ROUND(AVG(sp.progress), 2)
                        FROM `submissions` AS sub
                        LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                        LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                        WHERE sub.period = '2'
                        AND sub.submission_year = '2015/2016'
                        AND sub.state = '2'
                        AND sub.panel = 'panel'
                        AND pm.kpi_id = kpi.id
                       ), 0) + 
                IFNULL((
                        SELECT
                        ROUND(AVG(sp.progress), 2)
                        FROM `submissions` AS sub
                        LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                        LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                        WHERE sub.programme = 'programme'
                        AND sub.submission_year = '2015/2016'
                        AND sub.state = '2'
                        AND sub.period = '2'
                        AND pm.kpi_id = kpi.id
                       ), 0)/2) / 100 * 0.3 > 0 THEN 

      1+ ROUND(( 
               IFNULL((
                       SELECT
                       ROUND(AVG(sp.progress), 2)
                       FROM `submissions` AS sub
                       LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                       LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                       WHERE sub.period = '2'
                       AND sub.submission_year = '2015/2016'
                       AND sub.state = '2'
                       AND sub.panel = 'panel'
                       AND pm.kpi_id = kpi.id
                     ), 0) + 
              IFNULL((
                      SELECT
                      ROUND(AVG(sp.progress), 2)
                      FROM `submissions` AS sub
                      LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                      LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                      WHERE sub.programme = 'programme'
                      AND sub.submission_year = '2015/2016'
                      AND sub.state = '2'
                      AND sub.period = '2'
                      AND pm.kpi_id = kpi.id
                     ), 0)/2) / 100 * 0.3, 2)
         END
END AS p2

This is currently being repeated for the remaining periods. The data I get back is fine but the query is HUGE and ugly. On top of that I want a column at the end of the results that creates an average score for each KPI over all 13 periods. I know I can't reference a column I created by it's alias so currently I'd have to repeat the subqueries AGAIN to get this score.

Is the anyway I can make this query more efficient or loop through periods 2 - 13?

The full query with the results is below:

FULL QUERY UP TO PERIOD 2

        SELECT
        kra.id AS id,
        kra.kra_name AS name,
        kra.panel_weighting AS panel_weighting,
        kra.programme_weighting AS programme_weighting,
        kpi.id AS id,
        kpi.kpi_name AS name,
        kpi.panel_weighting AS panel_weighting,
        kpi.programme_weighting AS programme_weighting,
        (
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '1'
            AND pm.kpi_id = kpi.id
    ) AS p1,
     CASE  
      WHEN (
                SELECT
                ROUND(AVG(sp.progress), 2)
                FROM `submissions` AS sub
                LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                WHERE sub.period = '2'
                AND sub.submission_year = '2015/2016'
                AND sub.state = '2'
                AND sub.panel = 'panel'
                AND pm.kpi_id = kpi.id
      ) IS NULL 
      AND
      (
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '2'
            AND pm.kpi_id = kpi.id
           ) IS NULL
           THEN NULL
     ELSE 

     CASE 
     WHEN

            ( 
             IFNULL((
                SELECT
                ROUND(AVG(sp.progress), 2)
                FROM `submissions` AS sub
                LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                WHERE sub.period = '2'
                AND sub.submission_year = '2015/2016'
                AND sub.state = '2'
                AND sub.panel = 'panel'
                AND pm.kpi_id = kpi.id
         ), 0) + 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '2'
            AND pm.kpi_id = kpi.id
          ), 0)/2) / 100 * 0.3 < 0 THEN 

         1- ROUND(( 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.period = '2'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.panel = 'panel'
            AND pm.kpi_id = kpi.id
         ), 0) + 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '2'
                AND pm.kpi_id = kpi.id
              ), 0)/2) / 100 * 0.3, 2)
     WHEN 

         ( 
             IFNULL((
                SELECT
                ROUND(AVG(sp.progress), 2)
                FROM `submissions` AS sub
                LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
                LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
                WHERE sub.period = '2'
                AND sub.submission_year = '2015/2016'
                AND sub.state = '2'
                AND sub.panel = 'panel'
                AND pm.kpi_id = kpi.id
         ), 0) + 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '2'
            AND pm.kpi_id = kpi.id
          ), 0)/2) / 100 * 0.3 > 0 THEN 

         1+ ROUND(( 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.period = '2'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.panel = 'panel'
            AND pm.kpi_id = kpi.id
         ), 0) + 
         IFNULL((
            SELECT
            ROUND(AVG(sp.progress), 2)
            FROM `submissions` AS sub
            LEFT JOIN `sub_performancemeasures` sp ON sp.sub_id = sub.id
            LEFT JOIN `performance_measures` pm ON pm.id = sp.pm_id
            WHERE sub.programme = 'programme'
            AND sub.submission_year = '2015/2016'
            AND sub.state = '2'
            AND sub.period = '2'
                AND pm.kpi_id = kpi.id
              ), 0)/2) / 100 * 0.3, 2)
     END
     END AS p2

    FROM `kpis` kpi
    LEFT JOIN `key_reporting_areas` kra ON kra.id = kpi.kra_id
    GROUP BY kpi.id

Upvotes: 0

Views: 53

Answers (1)

DRapp
DRapp

Reputation: 48139

Since all your underlying sub-queries are based on the corresponding "kpi.id", which is associated with your "pm.kpi_id", that left-join has now pulled everything into an implied INNER join via the WHERE clause. Is that what you expected?

Now, how to possibly streamline it. I would do one big query of each of the period/states into a single "pre-query", then join it ONCE to your "kpi" table based on the ID and pull whatever final values. This is just a SAMPLE, but SHOULD be a good enough guide to what I am referring.

For indexing, I would start by ensuring your "submissions" table has an index on ( submission_year, state, period, programme ) to help optimization.

SELECT
      kra.id AS id,
      kra.kra_name AS name,
      kra.panel_weighting AS panel_weighting,
      kra.programme_weighting AS programme_weighting,
      kpi.id AS id,
      kpi.kpi_name AS name,
      kpi.panel_weighting AS panel_weighting,
      kpi.programme_weighting AS programme_weighting,
      PreAgg.AvgProgrammePer1,
      PreAgg.AvgPanelPer1,
      PreAgg.AdjFactorPer1,
      CASE when PreAgg.AdjFactorPer1 < 0 then 1 - PreAgg.AdjFactorPer1
           when PreAgg.AdjFactorPer1 > 0 then 1 + PreAgg.AdjFactorPer1
           END as FinalAdjPer1,
      PreAgg.AvgProgrammePer2,
      PreAgg.AvgPanelPer2,
      PreAgg.AdjFactorPer2,
      CASE when PreAgg.AdjFactorPer2 < 0 then 1 - PreAgg.AdjFactorPer2
           when PreAgg.AdjFactorPer2 > 0 then 1 + PreAgg.AdjFactorPer2
           END as FinalAdjPer2,
      PreAgg.AvgProgrammePer3,
      PreAgg.AvgPanelPer3,
      PreAgg.AdjFactorPer3,
      CASE when PreAgg.AdjFactorPer3 < 0 then 1 - PreAgg.AdjFactorPer3
           when PreAgg.AdjFactorPer3 > 0 then 1 + PreAgg.AdjFactorPer3
           END as FinalAdjPer3
   from
      kpis kpi
         LEFT JOIN
         ( select
                 KPI_Period.kpi_id,
                 CASE when KPI_Period.Period = 1 
                      then AvgProgramme END as AvgProgrammePer1,
                 CASE when KPI_Period.Period = 1 
                      then AvgPanel END as AvgPanelPer1,
                 CASE when KPI_Period.Period = 1 
                      then (( coalesce( AvgProgramme, 0 )
                            + coalesce( AvgPanel, 0 )) / 2 ) / 100 * .3 as AdjFactorPer1,
                 CASE when KPI_Period.Period = 2 
                      then AvgProgramme END as AvgProgrammePer2,
                 CASE when KPI_Period.Period = 2 
                      then AvgPanel END as AvgPanelPer2,
                 CASE when KPI_Period.Period = 2 
                      then (( coalesce( AvgProgramme, 0 )
                            + coalesce( AvgPanel, 0 )) / 2 ) / 100 * .3 as AdjFactorPer2,
                 CASE when KPI_Period.Period = 3 
                      then AvgProgramme END as AvgProgrammePer3,
                 CASE when KPI_Period.Period = 3 
                      then AvgPanel END as AvgPanelPer3,
                 CASE when KPI_Period.Period = 3 
                      then (( coalesce( AvgProgramme, 0 )
                            + coalesce( AvgPanel, 0 )) / 2 ) / 100 * .3 as AdjFactorPer3
              from
                 ( SELECT
                         pm.kpi_id,
                         sub.period,
                         AVG( CASE when sub.programme = 'programme' then sp.progress end ) as AvgProgramme,
                         AVG( CASE when sub.programme = 'panel' then sp.progress end ) as AvgPanel,
                      FROM 
                         submissions AS sub
                            LEFT JOIN sub_performancemeasures sp 
                               ON sub.id = sp.sub_id
                               LEFT JOIN performance_measures pm 
                                  ON sp.pm_id = pm.id
                      WHERE 
                             sub.submission_year = '2015/2016'
                         AND sub.state = '2'
                         AND sub.programme IN ( 'programme', 'panel' ) 
                      GROUP BY
                         pm.kpi_id,
                         sub.period ) as KPI_Period
              group by
                 KPI_Period.kpi_id ) PreAgg

         ON kpi.id = PreAgg.kpi_id

Now, to explain what I am attempting here. The inner-most query is going directly to your underlying tables for submission progress activity based on the year 2015/2016, the state = 2 and the programme as either programme or panel. I am grouping the averages based on each PERIOD. Yes, you can adjust for the ROUNDING of the avg, but I did the average. So, the end of this phase, I would have something like...

KPI_ID   Period   AvgProgramme   AvgPanel
1        1        prog1.blah      panel1.blah
1        2        prog2.blah      panel2.blah
1        3        prog3.blah      panel3.blah

As the rest of the year progresses, so too will the records for period 4-12(13 if four-week periods vs monthly periods). This result is the "KPI_Period" alias subselect select.

Now, that this has been simplified for all possible periods as like a "year-to-date" possibility, I am building a cross-tabbing each period into columns. Now, this part will significantly simplify your IFNULL() conditions. COALESCE() will take a value, and if null, use the second parameter as the value to apply whatever. So, the next level up brings us the

CASE when KPI_Period.Period = 1 
     then AvgProgramme END as AvgProgrammePer1,
CASE when KPI_Period.Period = 1 
     then AvgPanel END as AvgPanelPer1,
CASE when KPI_Period.Period = 1 
     then (( coalesce( AvgProgramme, 0 )
           + coalesce( AvgPanel, 0 )) / 2 ) / 100 * .3 as AdjFactorPer1,

Notice the case/when only cares about period 1 (or 2, or 3 respectively). So, for Period = 1, I would be getting the AvgProgramme value of "prog1.blah" and "panel1.blah" which could be null, or an actual value. So they are what they are, taken as is, but assigned a final name of "AvgProgrammePer1" and "AvgPanelPer1" respectively.

Now, your crazy computation. I am also pre-rolling up this computation ONCE via applying the COALESCE.

(   ( coalesce( AvgProgramme, 0 ) + coalesce( AvgPanel, 0 )) 
    / 2 ) / 100 * .3 as AdjFactorPer1,

So, if the AvgProgramme OR AvgPanel is a null value, it will be converted to 0 for the computation. So you will never have a divide by 0 failure, but 0/2 is zero, then 0 / 100 = 0... * .3 = 0. So, if both values are null, your final adjustment factor is zero and thus no +/- 1 applied in the final set. This is applied for each "period" giving a result something like...

(abbreviated columns and sample data)
KP  AvgProg1  AvgPan1  Adj1  AvgProg2  AvgPan2  Adj2  AvgProg3  AvgPan3  Adj3
1   prog1blah pan1blah avg1  prog2blah pan2blah avg2  prog3blah pan3blah avg3

So now, for EACH KPI_ID, you have 1 row with all the programme, panel and adjustment computation factors applied, no Cartesian results, so no duplicate accounting should occur. This is the "PreAgg" (pre-aggregate) final result that will be joined to your KPI table.

Now, the top-most level of all the fields. You can (as I sampled), grab all the columns, but here is the simplified part of the adjustment factor. Since it is already pre-computed at the Pre-Agg level, we can now just apply the factoring

PreAgg.AvgProgrammePer1,
PreAgg.AvgPanelPer1,
PreAgg.AdjFactorPer1,
CASE when PreAgg.AdjFactorPer1 < 0 then 1 - PreAgg.AdjFactorPer1
     when PreAgg.AdjFactorPer1 > 0 then 1 + PreAgg.AdjFactorPer1
     END as FinalAdjPer1,

You can always remove the redundant columns you don't care to have in the final result, but at least CAN have them to confirm computations along the way. And this query applies for 3 full periods... just needs the additional case/when blockes for periods 4-12, no additional subquerying constructs to bang your head with.

Upvotes: 1

Related Questions