Julian
Julian

Reputation: 107

Azure Data Factory is creating weird filenames in my data sink - why?

I'm encountering an issue with my Azure Data Factory Pipeline. Within the pipeline, I've configured a Copy activity and a data flow.

The Copy activity is set up to connect to a PostgreSQL database using a query to retrieve specific columns. It then stores this data in a file within a Storage Container.

Following this, the data flow processes the retrieved data, making some minor modifications, and ultimately outputs the final result to a datasink. I've defined a dataset (CSV file) to contain the final output, which is stored in a different Azure Data Storage Container.

However, despite specifying a fixed name for the CSV file, whenever I trigger the pipeline, Azure Data Factory generates a CSV file with a seemingly random name putting it in my storage container. This happens consistently each time I trigger the pipeline. So in the end I have multiple files there with weird names instead of always having one file with my defined file name.

Could someone please shed some light on why this is happening? Randomn weird name: part-00000-6793304a-9f39-4b09-bcf6-8ea78f15ac2d-c000.csv

enter image description here

Upvotes: 0

Views: 338

Answers (1)

Rakesh Govindula
Rakesh Govindula

Reputation: 11234

Could someone please shed some light on why this is happening? Randomn weird name: part-00000-6793304a-9f39-4b09-bcf6-8ea78f15ac2d-c000.csv

Its due to the partitioning in the dataflow sink. By default, Dataflow will generate part files as per the current partitioning. Dataflow follows the spark architecture and that's why the output files will be generated like this.

To get a single output file from dataflow, Go to Dataflow sink -> Settings -> set the File name option from Default to Output to single file and give the required file name here. If you leave

For single file, the partitioning needs be single, so set it as same.

enter image description here

In the target dataset, give the path till your target folder location.

enter image description here

It won't take the target file name from the dataflow itself not from the sink dataset. It will generate the file in the location as per the target dataset folder location.

Now, execute the dataflow from pipeline and you can see single output file with required name.

enter image description here

Upvotes: 1

Related Questions