opensourcegeek
opensourcegeek

Reputation: 5962

How bigquery's time_partitioning_expiration parameter work?

I've created a table with partition type day, and I have set time_partitioning_expiration to 1209600 seconds (14 days) from bq command line tool. I have verified if the settings are correct by running bq show on table and I can see

"timePartitioning": {
    "expirationMs": "1209600000", 
    "type": "DAY"
  }, 
  "type": "TABLE"

However there seems to be data in the partitions that I expected to have been deleted.

SELECT 
  count(*) as c,
  _partitiontime as pDate
FROM [poc.reporting]
group by pDate
; 

1   373800  2016-07-17 00:00:00 UTC  
2   640800  2016-07-18 00:00:00 UTC  
3   373800  2016-07-16 00:00:00 UTC  

Is my understanding that setting time_partition_expiration will remove/delete partitions internally correct? If so why hasn't partitions been deleted in my case?

Upvotes: 3

Views: 4345

Answers (2)

Chris Vale
Chris Vale

Reputation: 61

It looks like the behavior described in the top answer from Pavan changed on Dec 13, 2016, and expiration time is now based on the partition date, and NOT the last modified date.

From the most recent docs:

For example, if the partition expiration is set to 60 days and data is written to partition 20170101 on January 30, 2017, then the data expires on January 1, 2017 + 60 days.

Reference: BQ Partitioned Tables

Upvotes: 6

Pavan Edara
Pavan Edara

Reputation: 2315

Expiration of partitions is not based on the partition's date but it is based on the time at which storage was exactly added to the table. So, for example, for a table with partition expiration of 14 days, if a partition 20160717's data was restated on 20160730, the data for that partition would expire 14days from 20160730. You can verify the write time of the oldest data in the partition using the following query:

SELECT MSEC_TO_TIMESTAMP(creation_time) from [project:dataset.table$__PARTITIONS_SUMMARY__]

Upvotes: 3

Related Questions