'pimp' airflow databricks_hook or some python library to create and get the cluster_id for downstream tasks

Question

I have similar questions like below, but i wonder there is an existing library work nicely with airflow to create databricks cluster, return the cluster_id, and reuse for the downstream tasks.

Triggering Databricks job from Airflow without starting new cluster

My study shows: the DatabricksHook class has quite some nice methods and api calls, but it does not have calls to create cluster and re-use the cluster in the same DAG.

if i have to add the methods myself: In scala or other language, one could 'pimp' the library to add new methods to an 3rd party class. Any suggestion in python to do elegant way of adding extra methods?

Do I have to either 'duplicate' some of the code which is written in the Databrickshook into my project, and add the missing methods
or maybe contribute to the airflow project to 'add' the missing methods to the DatabricksHook class. This might take a longer time than i could wait for.

Info:

the existing code for the hook class, where it has rest API calls: https://airflow.apache.org/docs/apache-airflow/1.10.3/_modules/airflow/contrib/hooks/databricks_hook.html
curl interface to create cluster etc: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#--create

'pimp' airflow databricks_hook or some python library to create and get the cluster_id for downstream tasks

Answers (1)

Related Questions

&#39;pimp&#39; airflow databricks_hook or some python library to create and get the cluster_id for downstream tasks

Answers (1)

Related Questions

'pimp' airflow databricks_hook or some python library to create and get the cluster_id for downstream tasks