AleOtero93
AleOtero93

Reputation: 498

Avoid multiple subqueries

I'm manipulating a long, very queried table (> 500 millions entries), so it's very important to avoid large queries.

Currently I need to get some values with a condition (will explain better in a moment) and then, check if that values are in another group of values (all of this refering to the same field). I'm creating a view of the table, using a with.

So, here is the table syntax: (table employee)

+--------+-------------+-----------+--------+---------+-----------+
| period | employee_id | operation | sub_op | payment | work_zone |
+--------+-------------+-----------+--------+---------+-----------+

Periods have this format 'YYMM', one period refers to one month.

Of course the table is much, much longer than this sample, but I need only those fields in the query. A brief explanation of what I need, and then the query itself.

I need to get all employee_id in the current period, with a considerable payment (at least $250) and a specific operation (first I group that operations with sub_op value). The operation value asked is 97, and in the query you will see how I group it.

Now, to that values, I group them by work_zone and the grouped operation values. And now the subqueries start... I need:

  1. All that values that were not in past period.
  2. All that values that were not in last 36 periods (3 years).
  3. All that values that were in at least one of last 36 periods.
  4. All that values that were in at least one of last 36 periods, but with a diferent operation.
  5. All that values that were in at least one of last 36 periods, but with a payment lower than $250.

So, here is the query I've got so far. (I'm using as period '1109')

CREATE OR REPLACE VIEW hired_fired AS
WITH query_hired_fired AS ( 
    SELECT work_zone, operation, sub_op, employee_id,
        CASE 
            WHEN operation = 97 THEN 
                CASE 
                    WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                    WHEN sub_op IN (2,6) THEN 'Waitress' 
                    WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                    WHEN sub_op = 11 THEN 'Security' 
                    WHEN sub_op IN (12,13) THEN 'Cleaners' 
                    ELSE 'Others' 
                END
        END AS opgroup
    FROM employee 
    WHERE period = 1109 AND payment >= 250 AND operation = 97
)
SELECT 201109 AS periodo, opgroup, work_zone
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period = 1108 AND payment >= 250 AND operation = 97)) AS total, 
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS absolut,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS reincorporated,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97)) AS operation_change,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97)) AS raised,
FROM query_hired_fired
GROUP BY work_zone, opgroup

So, my question is... Is there anyway I can do this query without all the subqueries? I think this would take several hours to run, and that is not a posibility working with this table.

Sorry if I have been unclear with something, I will answers all comentaries and doubts as soon as possible. Thanks.

Upvotes: 0

Views: 162

Answers (2)

DRapp
DRapp

Reputation: 48139

I am somewhat similar to Kordirko, but wrapped into one. The inner "PreCalc" query is on the premise to compute one row per employee with a flag of 1 or 0 if they met the condition. Since all your criteria are based on the range to either be just 1108 OR ( between 0808 and 1108), that subquery just gets ALL records between the 0808 and 1108 so it will simplify the readability of the complex case/when conditions. The only condition I applied it was the first where you were specifically looking for the exact prior period. That said, the rest of the items were the qualifiers of payment amount and is (or not) operation 97. So for any employee, the flags would be set to 1 or 0 respectively.

Now, that gets applied to the outer query doing a SUM/CASE. To account for your "NOT IN", I am looking for the given flag = 0 (thus did not qualify from the underlying data) vs flag = 1 it DID qualify in the underlying data.

Since the pre-query computed the "opgroup" as well, it is all wrapped up nicely.

I would make sure that yourtable has an index on the ( period, employee_id, work_zone ) to help optimize. You might go further with the index keys to make it a covering index, but see how this works first.

SELECT
      201109 AS periodo, 
      work_zone, 
      opgroup,
      SUM( case when PreCalc.LPOver250 == 0 end ) as EmpsNotInLastPeriodOver250,
      SUM( case when PreCalc.Over250Op97 == 0 end ) as EmpsNotInOver250Per97,
      SUM( case when PreCalc.Over250Op97 == 1 end ) as EmpsInOver250Per97,
      SUM( case when PreCalc.Over250NotOp97 == 1 end ) as EmpsOver250NotInOp97,
      SUM( case when PreCalc.Under250 == 1 end ) as EmpsUnder250
   from
      ( SELECT
              Employee_ID,
              work_zone, 
              CASE WHEN operation = 97 THEN 
                   CASE WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                        WHEN sub_op IN (2,6) THEN 'Waitress' 
                        WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                        WHEN sub_op = 11 THEN 'Security' 
                        WHEN sub_op IN (12,13) THEN 'Cleaners' 
                        ELSE 'Others' 
                   END
              END AS opgroup, 
              MAX( case when period = 1108 
                        and payment >= 250 
                        and operation = 97 then 1 else 0 end ) as LPOver250,
              MAX( case when payment >= 250 
                        and operation = 97 then 1 else 0 end ) as Over250Op97,
              MAX( case when payment >= 250 
                        and operation != 97 then 1 else 0 end ) as Over250NotOp97,
              MAX( case when payment < 250 
                        and operation = 97 then 1 else 0 end ) as Under250
           from
              employee 
           where
              period between 0808 and 1108
           group by
              Employee_ID,
              work_zone, 
              opgroup ) PreCalc
   group by
      work_zone, 
      opgroup

Upvotes: 1

krokodilko
krokodilko

Reputation: 36107

Try this query:

WITH query_hired_fired AS ( 
    SELECT work_zone, operation, sub_op, employee_id,
        CASE 
            WHEN operation = 97 THEN 
                CASE 
                    WHEN sub_op IN (1,3,5) THEN 'Cookers' 
                    WHEN sub_op IN (2,6) THEN 'Waitress' 
                    WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
                    WHEN sub_op = 11 THEN 'Security' 
                    WHEN sub_op IN (12,13) THEN 'Cleaners' 
                    ELSE 'Others' 
                END
        END AS opgroup
    FROM employee 
)
SELECT opgroup, work_zone,
       SUM( x_period_1109 * x_total )            As total,
       SUM( x_period_1109 * x_absolut )          As absolut,
       SUM( x_period_1109 * x_reincorporated )   As reincorporated,
       SUM( x_period_1109 * x_operation_change ) As operation_change,
       SUM( x_period_1109 * x_raised )           As raised
FROM (
    SELECT opgroup, work_zone, employee_id,
           MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END) as x_total,
           MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_absolut,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_reincorporated,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97 THEN 1 ELSE 0 END ) as x_operation_change,
           MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_raised,
           MAX( CASE WHEN period = '1109' AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) As x_period_1109
    FROM query_hired_fired
    WHERE period BETWEEN 0808 AND 1109
    GROUP BY opgroup, work_zone, employee_id
) x
GROUP BY work_zone, opgroup

This condition in your query: BETWEEN 1108 AND 0808 always evaluates to false,
I think it should be: BETWEEN 0808 AND 1108

Upvotes: 1

Related Questions