Reputation: 105
Hello,
I have a little head hache.
I want to create buckets and cp bulk files at the same time. I have multiple folder (datasetname) in schema folder with json file: schema/dataset1 schema/dataset2 schema/dataset3
The trick is Terraform generate bucketname + random numbers to avoid already name used. I have one question:
How to copy bulk files in a bucket (at the same time bucket creation)
resource "google_storage_bucket" "map" {
for_each = {for i, v in var.gcs_buckets: i => v}
name = "${each.value.id}_${random_id.suffix[0].hex}"
location = var.default_region
storage_class = "REGIONAL"
uniform_bucket_level_access = true
#If you destroy your bucket, this option will delete all objects inside this bucket
#if not Terrafom will fail that run
force_destroy = true
labels = {
env = var.env_label
}
resource "google_storage_bucket_object" "map" {
for_each = {for i, v in var.json_buckets: i => v}
name = ""
source = "schema/${each.value.dataset_name}/*"
bucket = contains([each.value.bucket_name], each.value.dataset_name)
#bucket = "${google_storage_bucket.map[contains([each.value.bucket_name], each.value.dataset_name)]}"
}
variable "json_buckets" {
type = list(object({
bucket_name = string
dataset_name = string
}))
default = [
{
bucket_name = "schema_table1",
dataset_name = "dataset1",
},
{
bucket_name = "schema_table2",
dataset_name = "dataset2",
},
{
bucket_name = "schema_table2",
dataset_name = "dataset3",
},
]
}
variable "gcs_buckets" {
type = list(object({
id = string
description = string
}))
default = [
{
id = "schema_table1",
description = "schema_table1",
},
]
}
...
Upvotes: 3
Views: 2926
Reputation: 4502
Why do you have bucket = contains([each.value.bucket_name], each.value.dataset_name)
? The contains function returns a bool
, and bucket
takes a string input (the name of the bucket).
There is no resource that will allow you to copy multiple objects at once to the bucket. If you need to do this in Terraform, you can use the fileset function to get a list of files in your directory, then use that list in your for_each
for the google_storage_bucket_object
. It might look something like this (untested):
locals {
// Create a master list that has all files for all buckets
all_files = merge([
// Loop through each bucket/dataset combination
for bucket_idx, bucket_data in var.json_buckets:
{
// For each bucket/dataset combination, get a list of all files in that dataset
for file in fileset("schema/${bucket_data.dataset_name}/", "**"):
// And stick it in a map of all bucket/file combinations
"bucket-${bucket_idx}-${file}" => merge(bucket_data, {
file_name = file
})
}
]...)
}
resource "google_storage_bucket_object" "map" {
for_each = local.all_files
name = each.value.file_name
source = "schema/${each.value.dataset_name}/${each.value.file_name}"
bucket = each.value.bucket_name
}
WARNING: Do not do this if you have a lot of files to upload. This will create a resource in the Terraform state file for each uploaded file, meaning every time you run terraform plan
or terraform apply
, it will do an API call to check the status of each uploaded file. It will get very slow very quickly if you have hundreds of files to upload.
If you have a ton of files to upload, consider using an external CLI-based tool to sync the local files with the remote bucket after the bucket is created. You can use a module such as this one to run external CLI commands.
Upvotes: 3