dng
dng

Reputation: 511

Difference between "Dataset" and "Inline" sources in Azure Data Factory dataflows?

What is the difference between the two source types "Dataset" and "Inline" in Azure Data Factory Data flow source ? In which situation should I use one instead of the other ?

I've read the official documentation from Microsoft but I couldn't figure it out :

When a format is supported for both inline and in a dataset object, there are benefits to both. Dataset objects are reusable entities that can be used in other data flows and activities such as Copy. These reusable entities are especially useful when you use a hardened schema. Datasets aren't based in Spark. Occasionally, you might need to override certain settings or schema projection in the source transformation.

Inline datasets are recommended when you use flexible schemas, one-off source instances, or parameterized sources. If your source is heavily parameterized, inline datasets allow you to not create a "dummy" object. Inline datasets are based in Spark, and their properties are native to data flow.

Upvotes: 4

Views: 12301

Answers (2)

JoeSoap223
JoeSoap223

Reputation: 1

In short: Inline connects to the "Linked Service" object. Dataset connects to the "Dataset" object. ... as they exist in ADF.

"Dataset" itself connects to a "Linked Service", so, using the Inline simply skips a (sometimes) unnecessary object.

Upvotes: 0

Joel Cochran
Joel Cochran

Reputation: 7758

Datasets are an additional level of abstraction and were historically required. Datasets definitely have their place as they offer additional features such as Schemas and Parameters, but the original requirement meant that you often ended up with many many Dataset objects in your repository, even for one off projects.

Inline permits you to access certain (but not all) Linked service resources without the need to create yet another Dataset object. If your operation doesn't need a schema, or you don't need a Dataset object for reuse in multiple projects, then Inline is a cleaner option. I'll mention this since the doc you quoted does: Inline can use Pipeline parameters, so the solution can still be dynamic.

As for recommendations, I would start with Inline and graduate to Datasets when the situation merits.

Upvotes: 12

Related Questions