medvedev1088
medvedev1088

Reputation: 3745

Why does BigQuery scan entire table although it's partitioned by hour?

This table is partitioned by hour:

SELECT *
FROM `blockchain-etl-internal.crypto_ethereum_partitioned.logs_by_topic_0xd78`
WHERE block_timestamp >= '2020-11-14 00:00:00' and block_timestamp < '2020-11-14 01:00:00'
ORDER BY block_timestamp DESC

But whatever filter on the block_timestamp I specify BigQuery scans the entire table. You can see that table size and the amount of data scanned in a query to make sure.

Isn't BigQuery supposed to only scan data in partitions that are filtered out?

Upvotes: 0

Views: 486

Answers (1)

Nhan Nguyen
Nhan Nguyen

Reputation: 410

This is because all rows in the table is still in the UNPARTITIONED partition and has not been repartitioned into their corresponding partitions. Repartitioning is triggered only when there's enough data (byte size is at least a certain threshold) (https://cloud.google.com/bigquery/streaming-data-into-bigquery#streaming_into_partitioned_tables).

At the moment, this threshold is set at 5gb, while the table has around 400mb as you stated.

Upvotes: 1

Related Questions