Reputation: 498
I'm manipulating a long, very queried table (> 500 millions entries), so it's very important to avoid large queries.
Currently I need to get some values with a condition (will explain better in a moment) and then, check if that values are in another group of values (all of this refering to the same field). I'm creating a view of the table, using a with
.
So, here is the table syntax: (table employee)
+--------+-------------+-----------+--------+---------+-----------+
| period | employee_id | operation | sub_op | payment | work_zone |
+--------+-------------+-----------+--------+---------+-----------+
Periods have this format 'YYMM', one period refers to one month.
Of course the table is much, much longer than this sample, but I need only those fields in the query. A brief explanation of what I need, and then the query itself.
I need to get all employee_id
in the current period
, with a considerable payment
(at least $250) and a specific operation
(first I group that operations
with sub_op
value). The operation
value asked is 97, and in the query you will see how I group it.
Now, to that values, I group them by work_zone
and the grouped operation
values. And now the subqueries start... I need:
So, here is the query I've got so far. (I'm using as period '1109')
CREATE OR REPLACE VIEW hired_fired AS
WITH query_hired_fired AS (
SELECT work_zone, operation, sub_op, employee_id,
CASE
WHEN operation = 97 THEN
CASE
WHEN sub_op IN (1,3,5) THEN 'Cookers'
WHEN sub_op IN (2,6) THEN 'Waitress'
WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
WHEN sub_op = 11 THEN 'Security'
WHEN sub_op IN (12,13) THEN 'Cleaners'
ELSE 'Others'
END
END AS opgroup
FROM employee
WHERE period = 1109 AND payment >= 250 AND operation = 97
)
SELECT 201109 AS periodo, opgroup, work_zone
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period = 1108 AND payment >= 250 AND operation = 97)) AS total,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS absolut,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS reincorporated,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97)) AS operation_change,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97)) AS raised,
FROM query_hired_fired
GROUP BY work_zone, opgroup
So, my question is... Is there anyway I can do this query without all the subqueries? I think this would take several hours to run, and that is not a posibility working with this table.
Sorry if I have been unclear with something, I will answers all comentaries and doubts as soon as possible. Thanks.
Upvotes: 0
Views: 162
Reputation: 48139
I am somewhat similar to Kordirko, but wrapped into one. The inner "PreCalc" query is on the premise to compute one row per employee with a flag of 1 or 0 if they met the condition. Since all your criteria are based on the range to either be just 1108 OR ( between 0808 and 1108), that subquery just gets ALL records between the 0808 and 1108 so it will simplify the readability of the complex case/when conditions. The only condition I applied it was the first where you were specifically looking for the exact prior period. That said, the rest of the items were the qualifiers of payment amount and is (or not) operation 97. So for any employee, the flags would be set to 1 or 0 respectively.
Now, that gets applied to the outer query doing a SUM/CASE. To account for your "NOT IN", I am looking for the given flag = 0 (thus did not qualify from the underlying data) vs flag = 1 it DID qualify in the underlying data.
Since the pre-query computed the "opgroup" as well, it is all wrapped up nicely.
I would make sure that yourtable has an index on the ( period, employee_id, work_zone ) to help optimize. You might go further with the index keys to make it a covering index, but see how this works first.
SELECT
201109 AS periodo,
work_zone,
opgroup,
SUM( case when PreCalc.LPOver250 == 0 end ) as EmpsNotInLastPeriodOver250,
SUM( case when PreCalc.Over250Op97 == 0 end ) as EmpsNotInOver250Per97,
SUM( case when PreCalc.Over250Op97 == 1 end ) as EmpsInOver250Per97,
SUM( case when PreCalc.Over250NotOp97 == 1 end ) as EmpsOver250NotInOp97,
SUM( case when PreCalc.Under250 == 1 end ) as EmpsUnder250
from
( SELECT
Employee_ID,
work_zone,
CASE WHEN operation = 97 THEN
CASE WHEN sub_op IN (1,3,5) THEN 'Cookers'
WHEN sub_op IN (2,6) THEN 'Waitress'
WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
WHEN sub_op = 11 THEN 'Security'
WHEN sub_op IN (12,13) THEN 'Cleaners'
ELSE 'Others'
END
END AS opgroup,
MAX( case when period = 1108
and payment >= 250
and operation = 97 then 1 else 0 end ) as LPOver250,
MAX( case when payment >= 250
and operation = 97 then 1 else 0 end ) as Over250Op97,
MAX( case when payment >= 250
and operation != 97 then 1 else 0 end ) as Over250NotOp97,
MAX( case when payment < 250
and operation = 97 then 1 else 0 end ) as Under250
from
employee
where
period between 0808 and 1108
group by
Employee_ID,
work_zone,
opgroup ) PreCalc
group by
work_zone,
opgroup
Upvotes: 1
Reputation: 36107
Try this query:
WITH query_hired_fired AS (
SELECT work_zone, operation, sub_op, employee_id,
CASE
WHEN operation = 97 THEN
CASE
WHEN sub_op IN (1,3,5) THEN 'Cookers'
WHEN sub_op IN (2,6) THEN 'Waitress'
WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
WHEN sub_op = 11 THEN 'Security'
WHEN sub_op IN (12,13) THEN 'Cleaners'
ELSE 'Others'
END
END AS opgroup
FROM employee
)
SELECT opgroup, work_zone,
SUM( x_period_1109 * x_total ) As total,
SUM( x_period_1109 * x_absolut ) As absolut,
SUM( x_period_1109 * x_reincorporated ) As reincorporated,
SUM( x_period_1109 * x_operation_change ) As operation_change,
SUM( x_period_1109 * x_raised ) As raised
FROM (
SELECT opgroup, work_zone, employee_id,
MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END) as x_total,
MAX( CASE WHEN period = 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_absolut,
MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_reincorporated,
MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97 THEN 1 ELSE 0 END ) as x_operation_change,
MAX( CASE WHEN period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97 THEN 1 ELSE 0 END ) as x_raised,
MAX( CASE WHEN period = '1109' AND payment >= 250 AND operation = 97 THEN 1 ELSE 0 END ) As x_period_1109
FROM query_hired_fired
WHERE period BETWEEN 0808 AND 1109
GROUP BY opgroup, work_zone, employee_id
) x
GROUP BY work_zone, opgroup
This condition in your query: BETWEEN 1108 AND 0808
always evaluates to false,
I think it should be: BETWEEN 0808 AND 1108
Upvotes: 1