DilTeam
DilTeam

Reputation: 2661

How do I scale Azure Data Factory Dataflow?

I was able to setup a SCD Type 2 process quite easily using the ADF UI for one table BUT I don't see an easy way to scale to the 1000s of datasources we've. I don't see any Java APIs that will allow me to write ADF Pipelines/Dataflow and configure & trigger them dynamically. No UI to allow which tables to choose from a particular database etc. I looked at Azure Datalake Gen 2, Azure Databricks etc. I don't see any tool in Azure that will allow us to replace the UI driven Data Lake ingestion process we've built in house. Am I missing something?

On a side note, we've an old Data lake application that ingests data from thousands of datasources such as Databases, log files, web applications etc and stores data on HDFS (a typical architecture) using technologies as Java, Spark, Kafka etc. We're evaluating Azure Active Data Factory to replace it.

Upvotes: 0

Views: 1408

Answers (2)

David Moore
David Moore

Reputation: 161

You could leverage the REST API from Java to build out pipelines using code.

https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-rest-api

Upvotes: 0

Mark Kromer MSFT
Mark Kromer MSFT

Reputation: 3838

There is a generic SCD (Type 1, but you can retrofit to Type 2) example built into ADF. Go to New > Pipeline from template > Transform with Data flows > Generic SCD Type 1.

This pattern is outlined here: https://techcommunity.microsoft.com/t5/azure-data-factory/create-generic-scd-pattern-in-adf-mapping-data-flows/ba-p/918519.

You can also iterate over schemaless table datasets for Foreach inside a pipeline, calling the same data flow on every iteration.

Lastly, if you still wish to stamp-out data flows programmatically, the .NET and PowerShell SDKs are listed in the references section of the online Azure docs.

Upvotes: 0

Related Questions