Reputation: 347
I am blocked on reproducing in BigQuery a query that is similar to the following one on MSSQL :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT TOP 1 COL99 FROM ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM TABLE AS T
First, I tried to reproduce the query on BQ like the following :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT COL99 FROM PROJECT.DATASET.ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC LIMIT 1)
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
But it leads to the error :
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
I can understand this error, I agree that the original query is not very optimized since a subselect can be executed for every rows in the table.
Knowing that I tried the following which doesn't lead to an error but give wrong (too much) results :
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE AT.COL99
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
COL99,
COL8,
COL9
ROW_NUMBER() OVER (PARITION BY COL8 ORDER BY COL9 DESC) AS rn
) AS TMP
/*WHERE TMP.rn = 1*/
) AS AT
ON AT.COL8 = T.COL2
AND AT.COL9 < T.COL3
This query returns more rows than expected which is normal knowing the condition "AND AT.COL9 < T.COL3", but I have difficulties to find out how to take the minimum ROW_NUMBER value (rn) to reproduce the TOP 1 of the original query.
I tried to put TMP.rn = 1
in the AT table, but the problem is that it is not always the first value that respect the condition AND AT.COL9 < T.COL3
.
To resume, my goal is to be able to reproduce the first query at the top of this question on BigQuery, I've tried something but I am blocking on the how to take the minimum value of ROW_NUMBER (rn) matching the condition AND AT.COL9 < T.COL3
.
Did anyone had a similar use case by any chance ?
Edit : Adding input and output :
COL1 | COL2 | COL3 |
---|---|---|
1234 | AAA | 25/12/2022 |
1235 | BBB | 25/12/2022 |
1236 | CCC | 25/12/2022 |
1337 | AAA | 24/12/2022 |
1238 | AAA | 23/12/2022 |
1239 | AAA | 22/12/2022 |
COL99 | COL8 | COL9 |
---|---|---|
1111 | AAA | 25/12/2022 |
2222 | BBB | 25/12/2022 |
3333 | CCC | 25/12/2022 |
9999 | AAA | 23/12/2022 |
8888 | AAA | 22/12/2022 |
7777 | AAA | 21/12/2022 |
COL1 | COL2 | COL3 | COL4 |
---|---|---|---|
1234 | AAA | 25/12/2022 | AAA |
1235 | BBB | 25/12/2022 | NULL |
1236 | CCC | 25/12/2022 | CCC |
1237 | AAA | 24/12/2022 | 9999 |
1238 | AAA | 23/12/2022 | AAA |
1239 | AAA | 22/12/2022 | 7777 |
Upvotes: 0
Views: 158
Reputation: 347
The query provided by @forpas returns good results in my example but does not return the result I am waiting for in my real use case.
But @forpas's idea inspired me and I found a way to resolve my problem.
It gives the same result in the link provided by @forpas and the query looks like this in MySQL :
SELECT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE AT1.COL99
END AS COL4
FROM FIRST_TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
AT.COL99,
T.COL2,
T.COL3,
ROW_NUMBER() OVER (PARTITION BY T.COL3, T.COL2, AT.COL8 ORDER BY AT.COL9 DESC) AS COUNTER
FROM ANOTHER_TABLE AS AT
INNER JOIN FIRST_TABLE AS T
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3) TEMP
WHERE TEMP.COUNTER = 1
) AS AT1
ON AT1.COL2 = T.COL2 AND AT1.COL3 = T.COL3 ;
The query might be complex for nothing and if someone has something more optimized I would be happy to try it.
Thank you @forpas for the proposal !
Upvotes: 1
Reputation: 164099
You can use FIRST_VALUE()
window function:
SELECT DISTINCT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE FIRST_VALUE(AT.COL99) OVER (PARTITION BY T.COL1, T.COL2, T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM FIRST_TABLE AS T LEFT JOIN ANOTHER_TABLE AS AT
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 AND T.COL1 % 2 <> 0;
If COL1
is unique in the first table, you can simplify the PARTITION BY
clause to:
OVER (PARTITION BY T.COL1 ORDER BY AT.COL9 DESC)
See the demo (for MySql but it is standard SQL).
Upvotes: 1