Sekhar
Sekhar

Reputation: 699

Create partition on bigquery table using terraform

Description

I have list of bigquery tables to be created using terraform but I need only the partition for specific tables.

Here is the ex.

locals {
    path = "../../../../../../../../db"

    gcp_bq_tables = [
        "my_table1",
        "my_table1_daily",
        "my_table2",
        "my_table2_daily"
    ]
}

And, the terraform script to create the tables:

resource "google_bigquery_table" "gcp_bq_tables" {
    for_each    = toset(local.gcp_bq_tables)
    dataset_id  = google_bigquery_dataset.gcp_bq_db.dataset_id
    table_id    = each.value
    schema      = file("${local.path}/schema/${each.value}.json")
    labels = {
        env = var.env
        app = var.app
    }
}

In that I need to create partition on timestamp, type as DAY but the columns are different. Lets say for my_table1,

  1. The partition column would be my_ts_column_table1 for table1
  2. The partition column would be my_last_modified_column_table2 for table2

How to write the terraform script in this scenario.

My exploration

I find a way to do it in terraform_documentation but not sure for multiple tables and how can be specified the partition columns for both tables.

Upvotes: 0

Views: 3343

Answers (2)

Mazlum Tosun
Mazlum Tosun

Reputation: 6582

I hope this solution can help.

You can configure a json file to create dynamically your tables with partitions.

tables.json file

{
    "tables": {
        "my_table1": {
            "dataset_id": "my_dataset",
            "table_id": "my_table",
            "schema_path": "folder/myschema.json",
            "partition_type": "DAY",
            "partition_field": "partitionField",
            "clustering": [
                "field",
                "field2"
            ]
        },
        "my_table2": {
            "dataset_id": "my_dataset",
            "table_id": "my_table2",
            "schema_path": "folder/myschema2.json",
            "partition_type": "DAY",
            "partition_field": "partitionField2",
            "clustering": [
                "field",
                "field2"
            ]
        }
}

Then retrieve your tables from Terraform local file.

locals.tf file :

locals {
  tables = jsondecode(file("${path.module}/resource/tables.json"))["tables"]
}

I put a default partition in variables.tf file on myDefaultDate field :

variable "time_partitioning" {
  description = "Configures time-based partitioning for this table. cf https://www.terraform.io/docs/providers/google/r/bigquery_table.html#field"
  type = map(string)
  default = {
    type = "DAY"
    field = "myDefaultDate"
  }
}

In the resource.tf file, I used a dynamic bloc :

  • if the partition exists in the current table from the Json matadata configuration file tables.json, I take it.

  • Otherwise I take the default partition given by the variables.tf file.

resource.tf file :

resource "google_bigquery_table" "tables" {
  for_each = local.tables
  project = var.project_id
  dataset_id = each.value["dataset_id"]
  table_id = each.value["table_id"]
  clustering = try(each.value["clustering"], [])

  dynamic "time_partitioning" {
    for_each = [
      var.time_partitioning
    ]
    content {
      type = try(each.value["partition_type"], time_partitioning.value["type"])
      field = try(each.value["partition_field"], time_partitioning.value["field"])
      expiration_ms = try(time_partitioning.value["expiration_ms"], null)
      require_partition_filter = try(time_partitioning.value["require_partition_filter"], null)
    }
  }

  schema = file("${path.module}/resource/schema/${each.value["schema_path"]}")
}

Upvotes: 2

Marko E
Marko E

Reputation: 18213

In this case it might be the best to use dynamic [1] with for_each meta-argument [2] to achieve what you want. The code would have to be changed to:

resource "google_bigquery_table" "gcp_bq_tables" {
    for_each    = toset(local.gcp_bq_tables)
    dataset_id  = google_bigquery_dataset.gcp_bq_db.dataset_id
    table_id    = each.value
    schema      = file("${local.path}/schema/${each.value}.json")

    dynamic "time_partitioning" {
      for_each = each.value == "table1" || each.value == "table2" ? [1] : []
      content {
        type  = "DAY"
        field = each.value == "table1" ? "my_ts_column_table1" : "my_last_modified_column_table2"
      }
    }

    labels = {
        env = var.env
        app = var.app
    }
}

[1] https://developer.hashicorp.com/terraform/language/expressions/dynamic-blocks

[2] https://developer.hashicorp.com/terraform/language/meta-arguments/for_each

Upvotes: 2

Related Questions