vin
vin

Reputation: 195

Filesystem SDK vs Azure Data Factory

I'm very new to the Azure Data Lake Storage and currently training on Data Factory. I have a developer background so right away I'm not a fan of the 'tools' approach for development. I really don't like how there's all these settings to set and objects you have to create everywhere. I much prefer a code approach which allows us to detach the logic from the service (don't like the publishing thing to save), see everything by scrolling or navigate to different objects in a project, see differences easier in source control and etc. So I found this Micrososft's Filesystem SDK that seems to be an alternative to Data Factory: https://azure.microsoft.com/en-us/blog/filesystem-sdks-for-azure-data-lake-storage-gen2-now-generally-available/

What has been your experience using this approach? Is this a good alternative? Is there a way to run SDK code in data factory? that way we can leverage scheduling and triggers? I guess i'm looking for Pros/cons.

thank you

Upvotes: 0

Views: 114

Answers (2)

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89091

The code-centric alternative to Azure Data Factory for building and managing your Azure Data Lake is Spark. Typically either Azure Databricks or Azure Synapse Spark.

Upvotes: 0

Peter Bons
Peter Bons

Reputation: 29780

Well, the docs refer to several SDKs, one of them being the .Net SDK and the title is

Use .NET (or Python or Java etc.) to manage directories, files, and ACLs in Azure Data Lake Storage Gen2

So, the SDK lets you manage the filesystem only. No support for triggers, pipelines, dataflows and the lot. You will have to stick to the Azure Data Factory for that.

Regarding this:

I'm not a fan of the 'tools' approach for development

I hate to tell you but the world is moving that way whether you like it or not. Take Logic Apps for example. Azure Data Factory isn't aimed at the hardcore developer but fulfils a need for people working with large sets of data like Data Engineers. I am already glad it integrates with git very well. Yes, there is some overhead in defining sinks and sources but they are reusable across pipelines.

If you really want to use code try Azure Databricks. Take a look at this Q&A as well.

TL;DR: The FileSystem SDK is not an alternative.

Upvotes: 1

Related Questions