gerardoherreras
gerardoherreras

Reputation: 33

How can I partition a table in BigQuery by date when the data type is integer?

At the moment I have a table with an integer field where a date value comes in YYYYMMDD format, the objective is to be able to partition the table by this field without the need to create another field or column

Do you have a sample code for partitioning?

enter image description here

Thank you

Upvotes: 0

Views: 1971

Answers (1)

narendra solanki
narendra solanki

Reputation: 412

To create an integer based partition table, One has to use GENERATE_ARRAY function.

RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(start_expression, end_expression[, step_expression]))

Considering a field like date (formatted as YYYYMMDD),sequentially generated 10000 numbers range(20200101-20210100) start from 20200101 would store around one year dates from (2020-01-01 to 2020-12-31) using total partition 1131 (30/31 per 100 partitions) and other partitions range (20201232-20210100) would not be used since those not valid formatted date.

Now coming to answer, if you have dates around for a year use below code or one which is also documented.

CREATE TABLE DATASET.T_TABLE_PAR
PARTITION BY RANGE_BUCKET(COD_DATE,GENERATE_ARRAY(20200101,20201231,1))
AS
SELECT  COD_DATE,CLUSTER FROM  DATASET.T_TABLE

And I believe you have more than a year date data. So answer is, there is no direct way you can do that ('without the need to create another field or column') .

I can think of a work around if you are willing to store changed data (20200201 as 20200132).

--This function does the change, actual date to partition number and vice-versa, change start and end date as per your need.
create temp function date_get_set(date_code int64,type string)
as(
(
SELECT case when type='get' then a
            when type='set' then b end
          from  (
SELECT cast(FORMAT_DATE("%Y%m%d", example) as int64) a,
       20200100+row_number() over(order by example asc) b
from UNNEST(GENERATE_DATE_ARRAY('2020-01-01', '2021-12-31')) AS example)
where (case when  type='get' then b
            when type='set' then a end) = date_code
)

);
select date_get_set(20200132,'get') get_the_date,date_get_set(20200201,'set') set_the_partiontion

Output of above select :

enter image description here

Now create the above function (Persistent- remove temp) and partition table as below.

CREATE TABLE DATASET.T_TABLE_PAR
PARTITION BY RANGE_BUCKET(COD_DATE,GENERATE_ARRAY(20200101,20202020,1))
AS
SELECT  DATASET.date_get_set(COD_DATE,'set') COD_DATE,CLUSTER FROM  DATASET.T_TABLE

To prune the query and read right date value use the function.

SELECT  DATASET.date_get_set(COD_DATE,'get'),CLUSTER 
FROM  DATASET.T_TABLE
where  COD_DATE = DATASET.date_get_set(20200201,'set')

So two disadvantaged, Function may increases some computational cost and the changed data(a view on the table would help further) . :)

Upvotes: 1

Related Questions