Reputation: 2805
I have been reviewing Databricks workflows. I have read many blogs on Databricks Workflows, however I can't find any reviews comparing Workflows to Azure Data Factory. It may well be that the two services are completely different, but my thoughts are that Workflows is positioned to do the job of ADF.
Can someone let me know if for example if Workflows have the same number of connectors as ADF? Is it possible to connect to an on premise SQL Server similar to the way you can connect to an On-Premise SQL Server/Database with ADF's Self-Hosted Runtime agent?
Upvotes: 0
Views: 35
Reputation: 3215
Azure Data Factory (ADF) and Databricks Workflows serve different purposes and have different capabilities.
Azure Data Factory: Azure Data Factory is mainly used for data integration, migration, and orchestration, gives a platform to connect, ingest, and prepare data from multiple sources.
Databricks Workflows: Databricks Workflows provide a fully managed orchestration solution that is seamlessly integrated into the Databricks platform. It is accessible through multiple interfaces, including the Workflows UI, powerful APIs, and the Databricks CLI. This enables users to design, execute, monitor, and troubleshoot data pipelines without the burden of managing infrastructure. With built-in monitoring features, such as table and matrix views of workflow runs, it allows for quick issue identification and resolution.
Databricks Workflows are perfect for managing the transformation and processing parts of your data pipeline, especially when all the heavy lifting happens inside Databricks. If your data is already stored in Azure Data Lake (ADLS), you can easily read, process, and write it back without needing an external orchestration tool.
You can also use Workflows to efficiently extract data from cloud sources, web APIs, or other platforms that Databricks supports, making it a flexible option for handling data pipelines.
As you mentioned
ablility to connect to an on premise SQL Server similar to the way you can connect to an On-Premise SQL Server/Database with ADF's Self-Hosted Runtime agent?
If you need to extract data from on-premises sources that require a self-hosted integration runtime, or from data sources that Databricks JDBC connectors donot handle efficiently, It is best to use a dedicated ETL tool like Azure Data Factory for the job.
Referenece: Orchestration- Databricks Workflow VS Azure Data Factory Databricks Workflow: A fully-managed orchestration service for Lakehouse
Upvotes: 1