Reputation: 137
I'm attempting to automatically pull the latest version of a GitHub repo into my Databricks workspace every time a new push is made to the repo. Everything works fine until the Databricks CLI requests the host URL after which it fails with "Error: Process completed with exit code 1." I'm assuming it's an issue with my token and host credentials stored as secrets not properly loading into the environment. According to Databricks, "CLI 0.8.0 and above supports the following environment variables: DATABRICKS_HOST, DATABRICKS_USERNAME, DATABRICKS_PASSWORD, DATABRICKS_TOKEN". I've added both DATABRICKS_HOST and DATABRICKS_TOKEN as repository secrets, so I'm not sure what I'm doing wrong.
on:
push:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: setup python
uses: actions/setup-python@v2
with:
python-version: 3.8 #install the python version needed
- name: execute py
env:
DATABRICKS_HOST: $(DATABRICKS_HOST)
DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
run: |
python -m pip install --upgrade databricks-cli
databricks configure --token
databricks repos update --repo-id REPOID-ENTERED --branch "Development"
The error:
Successfully built databricks-cli
Installing collected packages: tabulate, certifi, urllib3, six, pyjwt, oauthlib, idna, click, charset-normalizer, requests, databricks-cli
Successfully installed certifi-2021.10.8 charset-normalizer-2.0.12 click-8.1.3 databricks-cli-0.16.6 idna-3.3 oauthlib-3.2.0 pyjwt-2.4.0 requests-2.27.1 six-1.16.0 tabulate-0.8.9 urllib3-1.26.9
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.8.12/x64/bin/python -m pip install --upgrade pip' command.
Aborted!
Databricks Host (should begin with https://):
Error: Process completed with exit code 1.
Upvotes: 1
Views: 2058
Reputation: 1493
I think calling the api directly without using the client works best. Below is code that works from azure devops. Should also work for a github action.
import requests
import sys
from adal import AuthenticationContext
user_parameters = {
"tenant" : "$(SP_TENANT_ID)",
"client_id" : "$(SP-CLIENT-ID)",
"redirect_uri" : "http://localhost",
"client_secret": "$(SP-CLIENT-SECRET)"
}
authority_host_url = "https://login.microsoftonline.com/"
azure_databricks_resource_id = "put_here"
authority_url = authority_host_url + user_parameters['tenant']
# supply the refresh_token (whose default lifetime is 90 days or longer [token lifetime])
def refresh_access_token(refresh_token):
context = AuthenticationContext(authority_url)
# function link
token_response = context.acquire_token_with_refresh_token(
refresh_token,
user_parameters['client_id'],
azure_databricks_resource_id,
user_parameters['client_secret'])
# the new 'refreshToken' and 'accessToken' will be returned
return (token_response['refreshToken'], token_response['accessToken'])
(refresh_token, access_token) = refresh_access_token("$(AAD-REFRESH-TOKEN)")
print('##vso[task.setvariable variable=ACCESS_TOKEN;]%s' % (access_token))
- bash: |
# Write your commands here
echo 'Patching Repo $(DB_WORKSPACE_HOST/$(REPO_ID)'
# Update the repo to the given tag
echo 'https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) $(Build.SourceBranchName)'
curl -n -X PATCH -o "/tmp/db_patch-out.json" https://$(DB_WORKSPACE_HOST)/api/2.0/repos/$(REPO_ID) \
-H 'Authorization: Bearer $(ACCESS_TOKEN)' \
-d '{"branch": "$(Build.SourceBranchName)"}'
cat "/tmp/db_patch-out.json"
grep -v error_code "/tmp/db_patch-out.json"
displayName: 'Update DataBricks Repo'
This works if there is network connectivity to databricks from your git provider. If you have adf on the same network and do not have network connectivity you can 1) spin up an api gateway to secure and bridge your network calls or 2) you can do an asynch trigger to adf and have it call databricks by dropping a file in azure storage https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger?tabs=data-factory. Or sending an email or other event trigger.
While the above methods work if there is a true IP address restriction, it appears the issue with the call my just be the CDC certificates are not verified correctly. You can override this locally using pip-system-certs or by exporting the cert from your browser and specifying the pem file.
Upvotes: 1