Reputation: 11
I have a 120 TB table which unfortunately was not partitioned on creation. The jobs that use it filter it by a date column called sales_date and are taking hours to complete. I am trying to find the best way to introduce partitions into the table. I tried a basic approach
create table dev_abhirami.sales_copy
partition by DATE_TRUNC(sales_date, MONTH)
cluster by sales_date
as select * from `src_dataset.sales`
I cannot partition by sales_date since it created more than 4000 partitions.
Upvotes: 1
Views: 45
Reputation: 208042
In this case just cluster by date, it will yield better results and clustering works on more than 4000 partitions. also you can choose to cluster further by using other columns, clustering is a box in a box model
Upvotes: 1