Steven
Steven

Reputation: 3294

Adding multiple S3 paths to glue crawler with terraform

I'm building out some infrastructure in AWS with Terraform. I have several S3 buckets created and want a Glue crawler to crawl these buckets once per hour. My Terraform Glue catalog db, role, and policy all build fine but when I try to create the crawler resource by adding four S3 paths to the s3_target{} portion of the crawler, I get a failure:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = "${var.glue_db_name}"
  name          = "${var.crawler_name}"
  role          = "${aws_iam_role.glue.id}" 

  s3_target {
#    count = "${length(var.data_source_path)}"
    path = "${var.data_source_path}"#"${formatlist("%s", var.data_source_path)}"
  }
}

This causes an error:

Error: aws_glue_crawler.datalake_crawler: s3_target.0.path must be a single value, not a list

I have tried adding a count statement in the s3_target but this fails. I have also tried adding

"${formatlist("%s", var.data_source_path)}"

in the path argument but this too fails.

Can I add multiple s3 paths to a Glue Crawler with Terraform? I can make this happen through the AWS console but this needs to be done using infrastructure as code.

Upvotes: 5

Views: 7108

Answers (1)

ydaetskcoR
ydaetskcoR

Reputation: 56987

To target additional S3 paths you can just repeat the s3_target block multiple times like this:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = "${var.glue_db_name}"
  name          = "${var.crawler_name}"
  role          = "${aws_iam_role.glue.id}" 

  s3_target {
    path = "${var.data_source_path_1}"
  }

  s3_target {
    path = "${var.data_source_path_2}"
  }
}

This is briefly alluded to in the aws_glue_crawler resource docs where it says:

s3_target (Optional) List nested Amazon S3 target arguments. See below.

You can also see this in the source code for the resource's schema:

        "s3_target": {
            Type:     schema.TypeList,
            Optional: true,
            MinItems: 1,

Unfortunately, pre 0.12, you can't build this programatically directly in Terraform to loop over a list of dynamic paths and need to specify them statically.

Terraform 0.12 will introduce HCL2 which has better support for loops (other than using count) including dynamic blocks which would allow you to then do something like this:

resource "aws_glue_crawler" "datalake_crawler" {
  database_name = var.glue_db_name
  name          = var.crawler_name
  role          = aws_iam_role.glue.id 

  dynamic "s3_target" {
    for_each = var.data_source_paths

    content {
      path = s3_target
    }
  }
}

Upvotes: 9

Related Questions