Dmitry L.
Dmitry L.

Reputation: 33

BigQuery: cost of querying tables partitioned by ingestion time vs date/timestamp partitioned

We are trying to build (or better say rebuild) our DWH in the cloud based on BigQuery. We decided to use 'partitioned by date field' tables (like a 'created_date' field) for our raw data instead of ingestion time partitions because with this feature we can load data easely and then query it with "group by" partition date column, build datamarts bla bla bla. We supposed that this partition method will increase queries speed and reduce it cost (versus non-partitioned tables - yes), BUT we've discovered than when you querying table with WHERE by partition field (like 'select count(*) from table where created_date=current_date'), it will cost money.

Our old-style ingestion time partitioned table queries with WHERE _PARTITIONTIME ='' were FREE! (like 'select count(*) from table where _PARTITIONTIME=current_date')

For example:

1) select value1 from table1 where _PARTITIONTIME = current_date

2) select value1 from table1 where created_date = current_date

3) select count(*) from table1 where _PARTITIONTIME = current_date

The second query costs more, because it will scan 2 columns. Its logical. But not fair((( The 3rd query is absolutely free btw!

This is very sad situation, because there is NO ANY WARNING about this 'side effect' in the documentation. This feature designed to make DB developers life easier (i guess), and it positioned as best practice feature and highly recommended by Google. But nobody said that it will cost you additional money also!

So the question is can we somehow query date-field partitioned tables using partition key for free? Is there any other pseudocolumn or method of filtering by partition key available if you use date/timestamp field based partitioning?

(ps: you guys from google must add some pseudocolumn for the date/timestamp partition method if it does not exist).

Thnx!

Upvotes: 3

Views: 1753

Answers (1)

Sergi Muñoz
Sergi Muñoz

Reputation: 75

So the question is can we somehow query date-field partitioned tables using partition key for free?

The answer is No, querying the partition will not be free.

Is there any other pseudocolumn or method of filtering by partition key available if you use date/timestamp field based partitioning?

If you want partitioning by date, this can only be achieved using ingestion-time partitioning with the _PARTITIONTIME pseudocolumn or using dates value in a selected date/timestamp value columns. Currently there is no alternative option available. Keep in mind that one of the main goals of partitioning is reducing the amount of data being scanned mainly by reducing the number of rows that are scanned.

You guys from google must add some pseudocolumn for the date/timestamp partition method if it does not exist

I understand that you would like to have some pseudocolumn for the data column partitioned method, but could you please elaborate a bit more what values you would like to see in this partition in your original post?

Edit: A feature request has been opened on your behalf. You can follow it here

Upvotes: 0

Related Questions