How to configure Terraform Databricks provider when deploying multiple databricks workspaces on Azure

Question

For my current project I have deployed a single Databricks workspace on Azure with Terraform and I'm trying to create an additional workspace. My code repository is based on several modules and I want to add additional workspaces to my Azure subscription using Terraform. However, things are not working out the way I want them to as I cannot create clusters for the different workspaces. The creation of the databricks workspaces is not an issue, the creation of the relevant databricks provider resources is.

Directory structure (I have other modules but those are not included as they are not relevant for this question):

 .
 |-config
 | |-dev.tfvars
 |-main.tf
 |-outputs.tf
 |-providers.tf
 |-variables.tf
 |-modules
 | |-db-cluster
 | | |-main.tf
 | | |-outputs.tf
 | | |-variables.tf
 | |-dbw
 | | |-main.tf
 | | |-outputs.tf
 | | |-variables.tf
 | |-network
 | | |-main.tf
 | | |-outputs.tf
 | | |-variables.tf

Making use of the modules I use a main.tf file in the root folder to determine the modules, which then should create (based on things such as count) the relevant resources.

For example in the main.tf file below we determine the databricks workspace module (dbw) and the databricks cluster module (db-cluster).

# ./main.tf file in the root module
# Databricks workspace
module "dbw-default" {
  count                                 = length(var.dbw-names)
  source                                = "./modules/dbw"
  dbw-name                              = var.dbw-names[count.index]
  dbw-project                           = var.project
  dbw-env                               = var.env
  dbw-resource-group-name               = module.rg-default[index(var.rg-names, "databricks")].name
  dbw-location                          = var.location
  dbw-sku                               = var.dbw-sku
  dbw-tags                              = merge(var.tags, { "purpose" = "databricks", "env" = var.env })
}
# Databricks Cluster
module "db-cluster-default" {
  source             = "./modules/db-cluster"
  db-cluster-name    = var.db-cluster-name
  db-cluster-env     = var.env
  db-cluster-project = var.db-cluster-project

  db-cluster-tags                    = merge(var.tags, { "purpose" = "databricks", "env" = var.env })
  db-cluster-min-workers             = var.db-cluster-min-workers
  db-cluster-max-workers             = var.db-cluster-max-workers
  db-cluster-autotermination-minutes = var.db-cluster-autotermination-minutes

}

This, combined with the main.tf in the dbw module (as detailed below) creates the databricks workspace using the azure_rm provider.

# ./modules/dbw/main.tf
resource "azurerm_databricks_workspace" "default" {
  name                = format("dbw-%s-%s-%s", var.dbw-name, var.dbw-project, var.dbw-env)
  resource_group_name = var.dbw-resource-group-name
  location            = var.dbw-location
  sku                 = var.dbw-sku
  tags = var.dbw-tags
}

Finally we have the databricks cluster module where the provider is again specified.

# ./modules/db-cluster/main.tf
terraform {
  required_providers {
    databricks = {
      source  = "databricks/databricks"
      version = "~> 1.6"
    }
  }
}

resource "databricks_cluster" "shared_autoscaling" {
  cluster_name            = format("db-cluster-%s-%s", var.db-cluster-project, var.db-cluster-env)
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = var.db-cluster-autotermination-minutes
  autoscale {
    min_workers = var.db-cluster-min-workers
    max_workers = var.db-cluster-max-workers
  }
}

This relates to the providers block in the root module:

# ./providers.tf
# providers and versions
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.29"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "~> 1.6"
    }
  }

  backend "azurerm" {
    #   environment variables are available from YAML pipeline to authenticate using service principal.
  }
}

provider "azurerm" {
  features {}
}

provider "databricks" {
  azure_workspace_resource_id = module.dbw-default.id
}

So when making the multiple modules I changed module.dbw-default.id to module.dbw-default[0].id in this provider block but that did not work. Completely leaving out the reference here and only using it at the db-cluster module level also didn't work. I get the following error message:

Error: cannot read cluster: cannot configure azure-client-secret auth: cannot get workspace: please set `azure_workspace_resource_id` provider argument.

At first I tried to refer to the module.dbw-default[0].id and tried several variations for this, but this does not work. Moreover I tried moving the terraform provider block for databricks to the db-cluster module and passing the dbw-id to reference in the provider there, but this didn't work either. I would greatly appreciate any help! I would like to emphasize that this all works well when using a single workspace but not anymore with multiple workspaces.

At first I tried to refer to the module.dbw-default[0].id and tried several variations for this, but this does not work. Moreover I tried moving the terraform provider block for databricks to the db-cluster module but this didn't work either.

How to configure Terraform Databricks provider when deploying multiple databricks workspaces on Azure

Answers (1)

Related Questions