Reputation: 21
I have a problem related to storing data on GCP Bigquery. I have a partition table which is size 10 Terabyte and increasing day by day. How can I store this data with minimum cost and max performance?
Fisrt Option : I can store last 1 month's data on Bigquery and the rest of the data on GCS. Second Option: Deleting after the last 1 month's data but this option is illogical to me.
What do you think about this issue?
Upvotes: 0
Views: 110
Reputation: 3528
The best solution is to use a BigQuery table which is partition by a usefull date column. A huge part of this table will be charged with the lower long time storage rate. Please consider for your whole project a region-zone, which has lower costs, if this is possible for your organisation, because all needed data needs to be in the same region.
For each query only the needed time and the needed columns are charged.
There is an option for using external tables for files stored in GCS. These have some drawbacks: for each query the complete data is read and charged. There are some partition possibilities using hive partition keys (https://cloud.google.com/bigquery/docs/hive-partitioned-queries). It is also not possible to precalculate the cost of a query, which is very bad for testing and debugging.
If you need only your monthly data for daily reports, it is enough to store these data in BigQuery and the rest in gcs. If you only need to run a query over longer times once a month, you can load the data from gcs into BigQuery and delete the table after your queries.
Upvotes: 1