vw96
vw96

Reputation: 41

Guidance Needed for Developing CI/CD Process in Databricks Using Azure DevOps

I am working on setting up a complete end-to-end CI/CD process for my Databricks environment using Azure DevOps. So far, I have developed a build pipeline to create a Databricks artifact (DAB).

enter image description here

Now, I need to create a release pipeline to deploy this artifact into production. My plan is to use the artifact from the build pipeline and the Databricks REST API to push it into production.

Questions:

  1. Will this approach publish workflows and notebooks into production exactly as they are in the development environment?
  2. Are there any best practices or recommendations for structuring the release pipeline?

I am new to this and would appreciate any suggestions.

Below is the code I’m currently using in the release pipeline.


Release Pipeline Code:

# Define Databricks variables
$databricksUrl = "<Databricks-URL>" # Replace with your Databricks instance URL
$accessToken = "<Access-Token>" # Replace with your secure token

# Define headers for Databricks REST API
$headers = @{
    "Authorization" = "Bearer $accessToken"
}

# Paths inside the Databricks workspace
$workspaceBasePath = ""
$notebookPath = ""
$jobPath = ""

# Function to create directories in Databricks
function Create-Directory {
    param ([string]$directoryPath)
    $createDirUri = "$databricksUrl/api/2.0/workspace/mkdirs"
    $body = @{ "path" = $directoryPath }
    
    try {
        Invoke-RestMethod -Method POST -Uri $createDirUri -Headers $headers -Body ($body | ConvertTo-Json -Depth 10) -ContentType "application/json"
        Write-Output "Directory '$directoryPath' created successfully in Databricks."
    } catch {
        if ($_.Exception.Response.StatusCode -ne 400) {
            Write-Error "Failed to create directory '$directoryPath': $_"
        }
    }
}

# Additional functions (Delete-File, Import-Notebook, Import-Job) are implemented similarly to handle file deletions and imports.

# Example pipeline steps:
Create-Directory -directoryPath "$workspaceBasePath/notebooks"
Create-Directory -directoryPath "$workspaceBasePath/jobs"

Delete-File -filePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Delete-File -filePath "$workspaceBasePath/jobs/job-config.json"

Import-Notebook -notebookPath $notebookPath -workspacePath "$workspaceBasePath/notebooks/Contingent_Employee_Report"
Import-Job -jobConfigJsonPath $jobPath

Thank you in advance for your time and suggestions!

Upvotes: 0

Views: 94

Answers (1)

Ikram M.
Ikram M.

Reputation: 21

Will this approach publish workflows and notebooks into production exactly as they are in the development environment?

We don't have the definition of the functions Import-Job and Import-Notebook you mentioned in your code but if you're using the api's workspace/jobs and workspace/import it should work, as long as you handle updating an existing job.

And it'll publish the workflows and notebooks as they are defined in your repository / current branch.

Are there any best practices or recommendations for structuring the release pipeline?

Since you already have a DAB structure in your repository, and you deploy your notebooks and jobs at the same time, you can simply use curl to install the databricks cli and run a databricks bundle validate, the databricks bundle deploy, without needing to handle the creation of directories etc.

Documentation: Databricks asset bundles on databricks

Upvotes: 0

Related Questions