Azure data factory dataflow SELECT DISTINCT

Question

I have a dataflow with a few joins and when making the join #5, the number of row goes from 10,000 to 320,000 (to make an example of how the quantity is increased), but after that i have more joins to make so the dataflow is taking longer to complete.

What I do is to add an Aggregate transformation after the joins, to groupby the field that I will use later, using that in a way that I use a SELECT DISTINCT in a query on the database, but still taking soooo long to finish.

How can make this dataflow run faster?

Should I use an Aggregate (and groupby the fields) between every join, to avoid the duplicates or just add the Aggregate (and groupby the fields...) after the join where the rows starts to increase?

Thanks.

Mark Kromer MSFT · Accepted Answer

Can you switch to Lookups instead of Join and then choose "run single row". That provides the SELECT DISTINCT capability in a single step.

Also, to speed up the processing end-to-end, try bumping up to memory optimized and raise the core count.

Azure data factory dataflow SELECT DISTINCT

Answers (1)

Related Questions