Reputation: 3294
I'm building out some infrastructure in AWS with Terraform. I have several S3 buckets created and want a Glue crawler to crawl these buckets once per hour. My Terraform Glue catalog db, role, and policy all build fine but when I try to create the crawler resource by adding four S3 paths to the s3_target{}
portion of the crawler, I get a failure:
resource "aws_glue_crawler" "datalake_crawler" {
database_name = "${var.glue_db_name}"
name = "${var.crawler_name}"
role = "${aws_iam_role.glue.id}"
s3_target {
# count = "${length(var.data_source_path)}"
path = "${var.data_source_path}"#"${formatlist("%s", var.data_source_path)}"
}
}
This causes an error:
Error: aws_glue_crawler.datalake_crawler: s3_target.0.path must be a single value, not a list
I have tried adding a count
statement in the s3_target
but this fails. I have also tried adding
"${formatlist("%s", var.data_source_path)}"
in the path
argument but this too fails.
Can I add multiple s3
paths to a Glue Crawler with Terraform? I can make this happen through the AWS console but this needs to be done using infrastructure as code.
Upvotes: 5
Views: 7108
Reputation: 56987
To target additional S3 paths you can just repeat the s3_target
block multiple times like this:
resource "aws_glue_crawler" "datalake_crawler" {
database_name = "${var.glue_db_name}"
name = "${var.crawler_name}"
role = "${aws_iam_role.glue.id}"
s3_target {
path = "${var.data_source_path_1}"
}
s3_target {
path = "${var.data_source_path_2}"
}
}
This is briefly alluded to in the aws_glue_crawler
resource docs where it says:
s3_target (Optional) List nested Amazon S3 target arguments. See below.
You can also see this in the source code for the resource's schema:
"s3_target": {
Type: schema.TypeList,
Optional: true,
MinItems: 1,
Unfortunately, pre 0.12, you can't build this programatically directly in Terraform to loop over a list of dynamic paths and need to specify them statically.
Terraform 0.12 will introduce HCL2 which has better support for loops (other than using count
) including dynamic blocks which would allow you to then do something like this:
resource "aws_glue_crawler" "datalake_crawler" {
database_name = var.glue_db_name
name = var.crawler_name
role = aws_iam_role.glue.id
dynamic "s3_target" {
for_each = var.data_source_paths
content {
path = s3_target
}
}
}
Upvotes: 9