Reputation: 2661
I was able to setup a SCD Type 2 process quite easily using the ADF UI for one table BUT I don't see an easy way to scale to the 1000s of datasources we've. I don't see any Java APIs that will allow me to write ADF Pipelines/Dataflow and configure & trigger them dynamically. No UI to allow which tables to choose from a particular database etc. I looked at Azure Datalake Gen 2, Azure Databricks etc. I don't see any tool in Azure that will allow us to replace the UI driven Data Lake ingestion process we've built in house. Am I missing something?
On a side note, we've an old Data lake application that ingests data from thousands of datasources such as Databases, log files, web applications etc and stores data on HDFS (a typical architecture) using technologies as Java, Spark, Kafka etc. We're evaluating Azure Active Data Factory to replace it.
Upvotes: 0
Views: 1408
Reputation: 161
You could leverage the REST API from Java to build out pipelines using code.
https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-rest-api
Upvotes: 0
Reputation: 3838
There is a generic SCD (Type 1, but you can retrofit to Type 2) example built into ADF. Go to New > Pipeline from template > Transform with Data flows > Generic SCD Type 1.
This pattern is outlined here: https://techcommunity.microsoft.com/t5/azure-data-factory/create-generic-scd-pattern-in-adf-mapping-data-flows/ba-p/918519.
You can also iterate over schemaless table datasets for Foreach inside a pipeline, calling the same data flow on every iteration.
Lastly, if you still wish to stamp-out data flows programmatically, the .NET and PowerShell SDKs are listed in the references section of the online Azure docs.
Upvotes: 0