Brian
Brian

Reputation: 976

databricks asset bundle clusters for dev and prod

I am using Databricks bundles I have a dev and prod environment. I have a yaml that looks something like this:

# yaml-language-server: $schema=bundle_config_schema.json
bundle:
  name: baby-names

resources:
      tasks:
        - task_key: retrieve-baby-names-task
          existint_cluster: 1234
          notebook_task:
            notebook_path: ./retrieve-baby-names.py

targets:
  development:
    workspace:
      host: <workspace-url>
  production:
    workspace:
      host: <workspace-url>

This works great if you have the same cluster id in multiple environments, I don't I see Jinja is not supported. How can I set some logic that lets me deploy to env A with cluster id related to that env? vs b and its cluster id. this seems fundamental.

I have tried manually copy and pasting the new ID's which isn't what I want to do.

Upvotes: 1

Views: 1185

Answers (2)

Santos Saenz Ferrero
Santos Saenz Ferrero

Reputation: 31

You can retrieve the cluster id with a lookup variable block, using the name of the cluster. Every time you target an specific environment by running databricks bundle with the flag -t you will obtain the ID for the cluster that matches the name provided by you. Looks like this.

variables:
  cluster_id:
    description: Cluster ID for the given name 
    lookup:
      cluster: "<cluster_name>"

You can use this variable using interpolation provided by Databricks Asset Bundles: ${var.shared_cluster_id}

Upvotes: 3

Brian
Brian

Reputation: 976

The best solution I found was to use the Jinja package in python and in my build tool have a task that dynamically creates the yaml with the values for that environment.

Upvotes: 1

Related Questions