Reputation: 189
I want to run a query where the column by which to partition by is picked up from another table. The idea is to write a generic skeleton script, and in the future change the where clause in the partition by to a dynamic quantity.
In BigQuery, we have the following table available, which tells us the primary key for each table, PK_TABLE:
table_name | column_name |
---|---|
sometable | id |
What I'd like to do, is for each table in the database, run a script like this:
SELECT *
from (
SELECT *,
RANK() OVER (PARTITION BY
(SELECT COLUMN_NAME FROM PK_TABLE WHERE TABLE_NAME='SOMETABLE')
ORDER BY TIME DESC) AS RANK
FROM SOMETABLE
) A
WHERE A.RANK=1
I would expect the above table to return the exact same result as the following code:
SELECT *
from (
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY TIME DESC) AS RANK
FROM SOMETABLE
) A
WHERE A.RANK=1
But the two results are incredibly different. Is what I am trying with the first script possible?
The only reason I can think that the results are different is when I use a SELECT statement in PARTITION BY, it probably uses it as a literal and computes an unexpected results, as opposed to actually passing a column name to the query.
Would be good to have this validated.
Upvotes: 1
Views: 9747
Reputation: 189
As suggested by @JNevill and @ndk's post, the solution is to create a dynamic SQL with two steps.
For those who don't want to go through the link, the solution looks like this:
DECLARE col0 string;
set col0 = (SELECT COLUMN_NAME FROM PK_TABLE WHERE TABLE_NAME='SOMETABLE')
EXECUTE IMMEDIATELY format('''
SELECT *
from (
SELECT *,
RANK() OVER (PARTITION BY %s ORDER BY TIME DESC) AS RANK
FROM SOMETABLE
) A
WHERE A.RANK=1
''', col0);
Upvotes: 2