Trigger workflow job with Databricks Autoloader

Question

I have requirement to monitor S3 bucket for files (zip) to be placed. As soon as a file is placed in S3 bucket, the pipeline should start processing the file. Currently I have Workflow Job with multiple tasks the performs processing. In Job parameter, I have configured S3 bucket file path and able to trigger pipeline. But I need to automate the monitoring through Autoloader. I have setup Databricks autoloader in another notebook and managed to get the list of files that are arriving S3 path by querying checkpoint.

checkpoint_query = "SELECT * FROM cloud_files_state('%s') ORDER BY create_time DESC LIMIT 1" % (checkpoint_path)

But I want to integrate this notebook with my job but no clue how to integrate it with pipeline job. Some pointers to help will be much appreciable.

Trigger workflow job with Databricks Autoloader

Answers (1)

Related Questions