Reputation: 809
Following this tutorial, I can give the task a path to a notebook I want to run in Databricks -
notebook_task_params = {
'new_cluster': new_cluster,
'notebook_task': {
'notebook_path': '/Users/[email protected]/PrepareData',
},
}
But is there a way to tell the task to take a Python project (or maybe a path to a wheel file?) from S3 or Artifactory and run it instead of a notebook?
How can I make it work so it will be ready for production? I want to create a process so that after I push my change to the git repo, a CI/CD process will build the project, deploy it to S3/Artifactory, and trigger a Databricks job from Airflow, which consumes the project I deployed. Is it possible?
Upvotes: 2
Views: 1174
Reputation: 379
Like Alex Ott said, the python_wheel_task
argument is not available yet in DatabricksSubmitRunOperator
.
However, you can do it using the (multi) tasks
argument. Like below:
trigger_wheel = DatabricksSubmitRunOperator(
task_id = 'trigger_wheel',
tasks = [{"task_key": 'my_task',
"python_wheel_task": {
'package_name':'wheel_package_name',
'entry_point':'wheel_entry_point',
},
"new_cluster":cluster_config,
"libraries":[{"whl": "dbfs:/.../wheel_package_name-0.0.1-py3-none-any.whl"}]
}]
)
My example uses dbfs, but ofcourse you can use a wheel that is on S3. Just make sure your cluster has access to it.
You can also check Databricks DBX. This allows you to develop in an IDE and deploy your package easily as a Databricks job (E.g. a job that executes a wheel).
Upvotes: 1
Reputation: 87214
Airflow provider for Databricks support all task/job types provided by the Databricks REST API. If you want to run Python file, then you can use the spark_python_task
parameter (doc) to specify a path to the file. If you need to run a wheel file, then you can either use the spark_submit_task
, or provide a python_wheel_task
object inside the json
parameter that is used to fill the data for submission to REST API as we don't support this task yet in Airflow. Refer to the Databricks REST API docs for more information what parameters of this task you need to specify.
Upvotes: 3