Jamil
Jamil

Reputation: 858

How do I know when parallelism will be triggered in Azure data lake analytics?

I have Azure data lake analytics job that processes around 3.8 million records stored on Azure data lake store using U-SQL user defined operators.

On the first run, I set parallelism equal to 10 and on the second run I used parallelism equal to 1. Surprisingly, my job duration for both execution is same(around 1.5 hours). So it looks like parallelism is not triggered for my job. Is it because I used user defined operators? I am wondering how do I determine when parallelism will be triggered and when it will not?

Upvotes: 3

Views: 646

Answers (1)

Michael Rys
Michael Rys

Reputation: 6684

Did you use user-defined functions or a custom UDO?

User-defined functions should not impede parallelism. A custom UDO may, depending on its internals.

What do the job graph vertices say?

You can analyze the parallelization by looking at the job graph and if you download the profile, you can look at the vertex graph and use the Diagnostic tab to further drill into. Does the playback actually show parallel execution?

In general, the system should automatically parallelize your jobs based on the limit you specified, the size of the data and the complexity of the query operations and the statistics gathered and estimated by the query processor.

Upvotes: 4

Related Questions