Chibolete Sophos
Chibolete Sophos

Reputation: 67

How to connect to Azure Databricks' Hive using a SQLAlchemy from a third party app using a service principal?

I want to connect Superset to a Databricks for querying the tables. Superset uses SQLAlchemy to connect to databases which requires a PAT (Personal Access Token) to access.

It is possible to connect and run queries when I use the PAT I generated on my account through Databricks web UI? But I do not want to use my personal token in a production env. Even so, I was not able to find how to generate a PAT like token for a Service Principal.

The working SQLAlchemy URI is looks like this:

databricks+pyhive://token:[email protected]:443/default?http_path=sql%2Fprotocolv1%qqq%wwwwwwwwwww1%eeeeeeee-1111111-foobar00

After checking the Azure docs, there are two ways on how to run queries between Databricks and another service:

For the first and preferred method, I was able to advance, but I was not able to generate the Service Principal's PAT: I was able to register an app on Azure's AD. So I got the tenant ID, client ID and create a secret for the registered app. With this info, I was able to curl Azure and receive a JWT token for that app. But all the tokens referred in the docs are JTW's OAUTH2 tokens, which does not seems to work with SQLAlchemy URI.

I know it's possible to generate a PAT for a Service Principal since there is a mention on how to read, update and delete a Service Principal's PAT on the documentation. But it has no information on how to create a PAT for a Service Principal.

I prefer to avoid using the second method (creating an AD user for Superset) since I am not allowed to create/manage users for the AD.

In summary, I have a working SQLAlchemy URI, but I want to use a generated token, associated with a Service Principal, instead of using my PAT. But I can't find how to generate that token (I only found documentation on how to generate OAUTH2 tokens).

Upvotes: 1

Views: 683

Answers (1)

Alex Ott
Alex Ott

Reputation: 87259

You can create PAT for service principal as following (examples are taken from docs, do export DATABRICKS_HOST="https://hostname" before executing):

  • Add service principal into the Databricks workspace using SCIM API (doc):
curl -X POST '$DATABRICKS_HOST/api/2.0/preview/scim/v2/ServicePrincipals' \
  --header 'Content-Type: application/scim+json' \
  --header 'Authorization: Bearer <personal-access-token>' \
  --data-raw '{
    "schemas":[
      "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"
    ],
    "applicationId":"<application-id>",
    "displayName": "test-sp",
    "entitlements":[
      {
        "value":"allow-cluster-create"
      }
    ]
  }'
  • Get AAD Token for service principal (doc, another option is to use az-cli):
export DATABRICKS_TOKEN=$(curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
-d 'grant_type=client_credentials&client_id=<client-id>&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d&client_secret=<application-secret>' \
https://login.microsoftonline.com/<tenant-id>/oauth2/token|jq -r .accessToken)
  • Generate token using the AAD Token (doc):
curl -s -n -X POST "$DATABRICKS_HOST/api/2.0/token/create" --data-raw '{
  "lifetime_seconds": 100,
  "comment": "token for superset"
}' -H "Authorization: Bearer $DATABRICKS_TOKEN"

Upvotes: 1

Related Questions