Reputation: 25
Is it possible to delete data from a BigQuery table while loading data into it from an Apache Beam pipeline.
Our use case is such that we need to delete 3 days prior data from the table on the basis of a timestamp field (time when Dataflow pulls message from Pubsub topic).
Is it recommended to do something like this? If yes, is there any way to achieve this?
Thank You.
Upvotes: 2
Views: 1061
Reputation: 1099
I think best way of doing this setup you table as partitioned (based on ingestion time) table https://cloud.google.com/bigquery/docs/partitioned-tables And you can drop old partition manually
bq rm 'mydataset.mytable$20160301'
You can also set expiration time
bq update --time_partitioning_expiration [INTEGER] [PROJECT_ID]:[DATASET].[TABLE]
If ingestion time does not work for you you can look into https://cloud.google.com/bigquery/docs/creating-column-partitions - but it is in beta - works reliably but it is your call
Upvotes: 2