Reputation: 35
I have a table that accumulates data daily . I want data older than xx months to be deleted automatically. How can I use BQ command to set it up. ( The table is not partitioned ).
Upvotes: 1
Views: 2767
Reputation: 832
Way to do it with BQ command when table is not partition as asked in the question:
Without partition:
bq query --nouse_legacy_sql \
'delete
FROM
`{PROJECT}`.{DATASET}.{TABLE}' where DATE({YOUR_DATETIME_OR_TIMESTAMP_COL}) < DATE_SUB(CURRENT_DATE(), INTERVAL {NUMBER_OF_MONTHS} MONTH)
You will have to use this in some kind of wrapper script to execute it daily.
Or may be best way to schedule it is with: using Magnus from potens.io (https://potens.io/products/#magnus)
New way of auto deletions: All you need to know is partitioned tables: https://cloud.google.com/bigquery/docs/partitioned-tables
Following is a sample for understanding:
CREATE TABLE tmp.gbq_partition_table_sample
(col_1 string, col_ts timestamp, col_2 INT64, col_3 INT64
)
PARTITION BY DATE(col_ts)
OPTIONS(
partition_expiration_days=30,
description="partitioned by date"
)
Here I have col_ts as a column where data's timestamp has been stored. Same column is also partitioned on DATE(col_ts) and partition expiration is added as 30 days.
This table will keep accumulating data on daily basis and "automatically" keep deleting data that is older than 30 days.
This is the way you do not need to write any custom jobs to clean up your data.
Other way is if you have full table that needs to be deleted after certain days, you can always change expiration date of the table itself.
Upvotes: 3