John
John

Reputation: 73

Terraform BigQuery replaces table & deletes the data when schema gets updated

I have somewhat below folder structure:

.
├── locals.tf
├── main.tf
├── modules
│   ├── bigquery
│   │   ├── main.tf
│   │   ├── schema
│   │   └── variables.tf
│   ├── bigquery_tables
│   │   ├── main.tf
│   │   ├── schema
│   │   └── variables.tf
│   ├── bigquery_views
│   │   ├── main.tf
│   │   ├── queries
│   │   ├── schema
│   │   └── variables.tf
│   ├── cloud_composer
│   ├── list_projects
├── providers.tf
├── storage.tf
├── variables.tf
└── versions.tf

& my main.tf in bigquery_tables is:

resource "google_bigquery_table" "bq_tables" {
  for_each            = { for table in var.bigquery_dataset_tables : table.table_id => table }
  project             = var.project_id
  dataset_id          = each.value.dataset_id
  table_id            = each.value.table_id
  schema              = file(format("${path.module}/schema/%v.json", each.value.file_name))
  deletion_protection = false
  dynamic "time_partitioning" {
    for_each = try(each.value.time_partition, false) == false ? [] : [each.value.time_partition]
    content {
      type  = each.value.partitioning_type
      field = each.value.partitioning_field
    }
  }
}

The issue I am facing is since things are in development stage we have frequent update in schema. Thus this change in schema is causing the BigQuery table to be replaced by terraform & also leading to data loss in BigQuery.

Can some one suggest a solution like what I should add in by resource block to avoid replacing the table & data loss?

I am unsure how can I add "external_data_configuration" in my current block as per https://github.com/hashicorp/terraform-provider-google/issues/10919.

Upvotes: 2

Views: 4156

Answers (1)

Mazlum Tosun
Mazlum Tosun

Reputation: 6582

Unfortunately if you change the schema structure of the BigQuery table, you will have this behaviour.

For the deletion_protection param, I think it's better to set it to true to prevent data loss or not set it (it' true by default).

The solution I saw, because you are in dev mode, it's using the Terraform workspace and run your schema updates in a separated workspace.

It will create a new dataset or table in each apply, example :

locals.tf file :

locals {
  workspace = terraform.workspace != "default" ? "${terraform.workspace}_" : ""
}

main.tf file :

resource "google_bigquery_table" "bq_tables" {
  for_each            = { for table in var.bigquery_dataset_tables : table.table_id => table }
  project             = var.project_id
  dataset_id          = "${local.workspace}${each.value.dataset_id}"
  table_id            = each.value.table_id
  schema              = file(format("${path.module}/schema/%v.json", each.value.file_name))
  deletion_protection = false
  dynamic "time_partitioning" {
    for_each = try(each.value.time_partition, false) == false ? [] : [each.value.time_partition]
    content {
      type  = each.value.partitioning_type
      field = each.value.partitioning_field
    }
  }
}

In this example, I used the workspace as prefix of dataset ID, if the default workspace is used, the prefix is empty, otherwise equals to the given workspace :

dataset=mydataset

- workspace=default => dataset=mydataset
- workspace=evolfeature1 => dataset=evolfeature1_mydataset

Upvotes: 0

Related Questions