Reputation: 13507
Table dataset.transactions
contains ~5.000.000 records.
1) This query takes ~ 3 seconds:
SELECT *
FROM dataset.transactions
WHERE customer = 'c1' AND year = 2017
2) This query takes about 10 seconds:
SELECT
salesrep_id AS id,
date AS lastUsedForFplDate,
fpl AS individual_fpl,
ANY_VALUE(salesrep_name) AS salesrep_name,
ANY_VALUE(customer) AS customer
FROM dataset.transactions VT1
WHERE date = (
SELECT
MAX(date)
FROM dataset.transactions VT2
WHERE
VT1.salesrep_id = VT2.salesrep_id
)
GROUP BY
salesrep_id,
date,
fpl
3) Whereas this one takes more than 200 seconds (tried few times, cancelled it each time after 200 seconds):
WITH transactions AS (
SELECT *
FROM dataset.transactions
WHERE customer = 'c1' AND year = 2017
)
SELECT
salesrep_id AS id,
date AS lastUsedForFplDate,
fpl AS individual_fpl,
ANY_VALUE(salesrep_name) AS salesrep_name,
ANY_VALUE(customer) AS customer
FROM transactions VT1
WHERE date = (
SELECT
MAX(date)
FROM transactions VT2
WHERE
VT1.salesrep_id = VT2.salesrep_id
)
GROUP BY
salesrep_id,
date,
fpl
Third query consist of the two queries above, except the #2 uses output of #1 as a source.
Why does it take so much time?
Upvotes: 0
Views: 124
Reputation: 746
Just not to leave the question unanswered. The third query uses self-joins which is an anti pattern as commented by Elliott.
The result of using a self-join is that it potentially doubles the number of output rows. This increase in output data can cause poor performance.
Instead of using a self-join, it's recommended to use a window (analytic) function to reduce the number of additional bytes that are generated by the query.
The simplified syntax of Analytic Functions is as below:
analytic_function_name ( [ argument_list ] )
OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
)
Upvotes: 1