SQL Query within a Partition By statement in BigQuery

Question

I want to run a query where the column by which to partition by is picked up from another table. The idea is to write a generic skeleton script, and in the future change the where clause in the partition by to a dynamic quantity.

In BigQuery, we have the following table available, which tells us the primary key for each table, PK_TABLE:

table_name	column_name
sometable	id

What I'd like to do, is for each table in the database, run a script like this:

SELECT * 
from (
  SELECT *, 
        RANK() OVER (PARTITION BY 
            (SELECT COLUMN_NAME FROM PK_TABLE WHERE TABLE_NAME='SOMETABLE') 
                ORDER BY TIME DESC) AS RANK
  FROM SOMETABLE
  ) A 
WHERE A.RANK=1

I would expect the above table to return the exact same result as the following code:

SELECT * 
from (
  SELECT *, 
        RANK() OVER (PARTITION BY ID ORDER BY TIME DESC) AS RANK
  FROM SOMETABLE
  ) A 
WHERE A.RANK=1

But the two results are incredibly different. Is what I am trying with the first script possible?

The only reason I can think that the results are different is when I use a SELECT statement in PARTITION BY, it probably uses it as a literal and computes an unexpected results, as opposed to actually passing a column name to the query.

Would be good to have this validated.

Akshay · Accepted Answer

As suggested by @JNevill and @ndk's post, the solution is to create a dynamic SQL with two steps.

For those who don't want to go through the link, the solution looks like this:

DECLARE col0 string;
set col0 = (SELECT COLUMN_NAME FROM PK_TABLE WHERE TABLE_NAME='SOMETABLE')

EXECUTE IMMEDIATELY format('''
SELECT * 
from (
  SELECT *, 
        RANK() OVER (PARTITION BY %s ORDER BY TIME DESC) AS RANK
  FROM SOMETABLE
  ) A 
WHERE A.RANK=1
''', col0);

SQL Query within a Partition By statement in BigQuery

Answers (1)

Related Questions