Reputation: 33
At the moment I have a table with an integer field where a date value comes in YYYYMMDD format, the objective is to be able to partition the table by this field without the need to create another field or column
Do you have a sample code for partitioning?
Thank you
Upvotes: 0
Views: 1971
Reputation: 412
To create an integer based partition table, One has to use GENERATE_ARRAY function.
RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(start_expression, end_expression[, step_expression]))
Considering a field like date (formatted as YYYYMMDD),sequentially generated 10000 numbers range(20200101-20210100) start from 20200101 would store around one year dates from (2020-01-01 to 2020-12-31) using total partition 1131 (30/31 per 100 partitions) and other partitions range (20201232-20210100) would not be used since those not valid formatted date.
Now coming to answer, if you have dates around for a year use below code or one which is also documented.
CREATE TABLE DATASET.T_TABLE_PAR
PARTITION BY RANGE_BUCKET(COD_DATE,GENERATE_ARRAY(20200101,20201231,1))
AS
SELECT COD_DATE,CLUSTER FROM DATASET.T_TABLE
And I believe you have more than a year date data. So answer is, there is no direct way you can do that ('without the need to create another field or column') .
I can think of a work around if you are willing to store changed data (20200201 as 20200132).
--This function does the change, actual date to partition number and vice-versa, change start and end date as per your need.
create temp function date_get_set(date_code int64,type string)
as(
(
SELECT case when type='get' then a
when type='set' then b end
from (
SELECT cast(FORMAT_DATE("%Y%m%d", example) as int64) a,
20200100+row_number() over(order by example asc) b
from UNNEST(GENERATE_DATE_ARRAY('2020-01-01', '2021-12-31')) AS example)
where (case when type='get' then b
when type='set' then a end) = date_code
)
);
select date_get_set(20200132,'get') get_the_date,date_get_set(20200201,'set') set_the_partiontion
Output of above select :
Now create the above function (Persistent- remove temp) and partition table as below.
CREATE TABLE DATASET.T_TABLE_PAR
PARTITION BY RANGE_BUCKET(COD_DATE,GENERATE_ARRAY(20200101,20202020,1))
AS
SELECT DATASET.date_get_set(COD_DATE,'set') COD_DATE,CLUSTER FROM DATASET.T_TABLE
To prune the query and read right date value use the function.
SELECT DATASET.date_get_set(COD_DATE,'get'),CLUSTER
FROM DATASET.T_TABLE
where COD_DATE = DATASET.date_get_set(20200201,'set')
So two disadvantaged, Function may increases some computational cost and the changed data(a view on the table would help further) . :)
Upvotes: 1