Reputation: 3745
This table is partitioned by hour:
SELECT *
FROM `blockchain-etl-internal.crypto_ethereum_partitioned.logs_by_topic_0xd78`
WHERE block_timestamp >= '2020-11-14 00:00:00' and block_timestamp < '2020-11-14 01:00:00'
ORDER BY block_timestamp DESC
But whatever filter on the block_timestamp I specify BigQuery scans the entire table. You can see that table size and the amount of data scanned in a query to make sure.
Isn't BigQuery supposed to only scan data in partitions that are filtered out?
Upvotes: 0
Views: 486
Reputation: 410
This is because all rows in the table is still in the UNPARTITIONED
partition and has not been repartitioned into their corresponding partitions. Repartitioning is triggered only when there's enough data (byte size is at least a certain threshold) (https://cloud.google.com/bigquery/streaming-data-into-bigquery#streaming_into_partitioned_tables).
At the moment, this threshold is set at 5gb, while the table has around 400mb as you stated.
Upvotes: 1