Ferdi777
Ferdi777

Reputation: 347

BigQuery reproducing a query that use another table in a subselect

I am blocked on reproducing in BigQuery a query that is similar to the following one on MSSQL :

SELECT 
  COL1,
  COL2, COL3,
  CASE
    WHEN ( COL1 % 2 ) = 0 THEN COL2
    ELSE (SELECT TOP 1 COL99 FROM ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC)
  END AS COL4

FROM TABLE AS T

First, I tried to reproduce the query on BQ like the following :

SELECT 
  COL1,
  COL2, COL3,
  CASE
    WHEN ( COL1 % 2 ) = 0 THEN COL2
    ELSE (SELECT COL99 FROM PROJECT.DATASET.ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC LIMIT 1)
  END AS COL4

FROM PROJECT.DATASET.TABLE AS T

But it leads to the error : Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.

I can understand this error, I agree that the original query is not very optimized since a subselect can be executed for every rows in the table.

Knowing that I tried the following which doesn't lead to an error but give wrong (too much) results :

SELECT 
  COL1,
  COL2, COL3,
  CASE
    WHEN ( COL1 % 2 ) = 0 THEN COL2
    ELSE AT.COL99
  END AS COL4

FROM PROJECT.DATASET.TABLE AS T
LEFT JOIN (
  SELECT * FROM (
    SELECT 
       COL99,
       COL8,
       COL9
       ROW_NUMBER() OVER (PARITION BY COL8 ORDER BY COL9 DESC) AS rn
  ) AS TMP
  /*WHERE TMP.rn = 1*/
) AS AT
ON AT.COL8 = T.COL2
AND AT.COL9 < T.COL3

This query returns more rows than expected which is normal knowing the condition "AND AT.COL9 < T.COL3", but I have difficulties to find out how to take the minimum ROW_NUMBER value (rn) to reproduce the TOP 1 of the original query.

I tried to put TMP.rn = 1 in the AT table, but the problem is that it is not always the first value that respect the condition AND AT.COL9 < T.COL3.

To resume, my goal is to be able to reproduce the first query at the top of this question on BigQuery, I've tried something but I am blocking on the how to take the minimum value of ROW_NUMBER (rn) matching the condition AND AT.COL9 < T.COL3.

Did anyone had a similar use case by any chance ?

Edit : Adding input and output :

COL1 COL2 COL3
1234 AAA 25/12/2022
1235 BBB 25/12/2022
1236 CCC 25/12/2022
1337 AAA 24/12/2022
1238 AAA 23/12/2022
1239 AAA 22/12/2022
COL99 COL8 COL9
1111 AAA 25/12/2022
2222 BBB 25/12/2022
3333 CCC 25/12/2022
9999 AAA 23/12/2022
8888 AAA 22/12/2022
7777 AAA 21/12/2022
COL1 COL2 COL3 COL4
1234 AAA 25/12/2022 AAA
1235 BBB 25/12/2022 NULL
1236 CCC 25/12/2022 CCC
1237 AAA 24/12/2022 9999
1238 AAA 23/12/2022 AAA
1239 AAA 22/12/2022 7777

Upvotes: 0

Views: 158

Answers (2)

Ferdi777
Ferdi777

Reputation: 347

The query provided by @forpas returns good results in my example but does not return the result I am waiting for in my real use case.

But @forpas's idea inspired me and I found a way to resolve my problem.

It gives the same result in the link provided by @forpas and the query looks like this in MySQL :

SELECT T.COL1, T.COL2, T.COL3,
       CASE
         WHEN T.COL1 % 2 = 0 THEN T.COL2
         ELSE AT1.COL99
       END AS COL4
FROM FIRST_TABLE AS T 
LEFT JOIN (
  SELECT * FROM (
    SELECT 
      AT.COL99, 
      T.COL2, 
      T.COL3, 
      ROW_NUMBER() OVER (PARTITION BY T.COL3, T.COL2, AT.COL8 ORDER BY AT.COL9 DESC) AS COUNTER 
    FROM ANOTHER_TABLE AS AT
    INNER JOIN FIRST_TABLE AS T
    ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3) TEMP
  WHERE TEMP.COUNTER = 1
) AS AT1
ON AT1.COL2 = T.COL2 AND AT1.COL3 = T.COL3 ;

The query might be complex for nothing and if someone has something more optimized I would be happy to try it.

Thank you @forpas for the proposal !

Upvotes: 1

forpas
forpas

Reputation: 164099

You can use FIRST_VALUE() window function:

SELECT DISTINCT T.COL1, T.COL2, T.COL3,
       CASE
         WHEN T.COL1 % 2 = 0 THEN T.COL2
         ELSE FIRST_VALUE(AT.COL99) OVER (PARTITION BY T.COL1, T.COL2, T.COL3 ORDER BY AT.COL9 DESC)
       END AS COL4
FROM FIRST_TABLE AS T LEFT JOIN ANOTHER_TABLE AS AT
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 AND T.COL1 % 2 <> 0;

If COL1 is unique in the first table, you can simplify the PARTITION BY clause to:

OVER (PARTITION BY T.COL1 ORDER BY AT.COL9 DESC)

See the demo (for MySql but it is standard SQL).

Upvotes: 1

Related Questions