Abhra Ray
Abhra Ray

Reputation: 139

Databricks command not found in azure devops pipeline

I am trying to copy a file to Azure Databricks DBFS through Azure Devops pipeline. The following is a snippet from the yml file I am using:

stages:
- stage: MYBuild
  displayName: "My Build"
  jobs:
    - job: BuildwhlAndRunPytest
      pool:
        vmImage: 'ubuntu-16.04'

      steps:
      - task: UsePythonVersion@0
        displayName: 'Use Python 3.7'
        inputs:
          versionSpec: '3.7'
          addToPath: true
          architecture: 'x64'

      - script: |
          pip install pytest requests setuptools wheel pytest-cov
          pip install -U databricks-connect==7.3.*
        displayName: 'Load Python Dependencies'

      - checkout: self
        persistCredentials: true
        clean: true

      - script: |
          echo "y
          $(databricks-host)
          $(databricks-token)
          $(databricks-cluster)
          $(databricks-org-id)
          8787" | databricks-connect configure
          databricks-connect test
        env:
          databricks-token: $(databricks-token)
        displayName: 'Configure DBConnect'

      - script: |
          databricks fs cp test-proj/pyspark-lib/configs/config.ini dbfs:/configs/test-proj/config.ini

I get the following error at the stage where I am invoking the databricks fs cp command:

/home/vsts/work/_temp/2278f7d5-1d96-4c4e-a501-77c07419773b.sh: line 7: databricks: command not found

However, when I run databricks-connect test, it is able to execute the command successfully. Kindly help if I am missing some steps somewhere.

Upvotes: 3

Views: 3043

Answers (1)

Alex Ott
Alex Ott

Reputation: 87069

The databricks command is located in the databricks-cli package, not in the databricks-connect, so you need to change your pip install command.

Also, for databricks command you can just set the environment variables DATABRICKS_HOST and DATABRICKS_TOKEN and it will work, like this:

- script: |
    pip install pytest requests setuptools wheel
    pip install -U databricks-cli
  displayName: 'Load Python Dependencies'

- script: |
    databricks fs cp ... dbfs:/...
  env:
    DATABRICKS_HOST: $(DATABRICKS_HOST)
    DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
  displayName: 'Copy artifacts'

P.S. Here is an example on how to do CI/CD on Databricks + notebooks. You could be also interested in cicd-templates project.

Upvotes: 3

Related Questions