Reputation: 83
I'm trying to optimize our BigQuery model by using clustered tables.
I'm testing these scenarios:
Without apply any where condition scenario 1 and 2 have equal cost(time and bytes processed). When I apply a condition by a clustered field 1 is 4x time faster and cheaper.
Clustered fields are only useful when you use a condition in the query? and not in a join? In this case, if I perform a join without any condition, the performance is the same with or without the clusters
How can I improve a join between two tables in BigQuery?
EDIT 2021-05-31
Add query execution plan of both jobs:
Upvotes: 8
Views: 3618
Reputation: 87
From docs, I would say that the cluster will be simply ignore as you are comparing using another column, during the Join.
Now, for optimizing a join, you could try to reduce data before the join. For instance, try filtering the tables, or pre-aggregating them to reduce as much data as you can. Finally, also take care of the order of the tables on the join. Order them from the biggest to the smallest one.
Upvotes: 3