Query Performance over a BigQuery table with nullable field

Question

We need to upload data from our logs to Google BigQuery and we have two subsets of the log data that will not overlap when queried.

Subset number one has a field "vendor_id" which will be used a lot in WHERE clauses.
Subset number two are the log entries that do not have "vendor_id"

We could make only one table with a nullable "vendor_id" field or make two different tables one for each subset. Is there any difference in the performance of these aproaches?

Regards

Leo

Jordan Tigani · Accepted Answer

There will be little (if any) difference in query performance between the two options you mention. That said, the cost of queries is proportional to the amount of data read, so if you have two separate tables it will likely be less expensive, since each query will read a smaller amount of data.

Query Performance over a BigQuery table with nullable field

Answers (1)

Related Questions