Saravanan Ponnaiah
Saravanan Ponnaiah

Reputation: 81

Trigger workflow job with Databricks Autoloader

I have requirement to monitor S3 bucket for files (zip) to be placed. As soon as a file is placed in S3 bucket, the pipeline should start processing the file. Currently I have Workflow Job with multiple tasks the performs processing. In Job parameter, I have configured S3 bucket file path and able to trigger pipeline. But I need to automate the monitoring through Autoloader. I have setup Databricks autoloader in another notebook and managed to get the list of files that are arriving S3 path by querying checkpoint.

checkpoint_query = "SELECT * FROM cloud_files_state('%s') ORDER BY create_time DESC LIMIT 1" % (checkpoint_path)

But I want to integrate this notebook with my job but no clue how to integrate it with pipeline job. Some pointers to help will be much appreciable.

Upvotes: 1

Views: 1181

Answers (1)

Amin Movahed
Amin Movahed

Reputation: 91

You need to create a workflow job and add the pipeline as the upstream task and your notebook as the downstream. Currently there is no way to run custom notebooks within a dlt pipeline.

Check this for how to create a workflow: https://docs.databricks.com/workflows/jobs/jobs.html#job-create

Upvotes: 0

Related Questions