marz
marz

Reputation: 93

Data Factory - Copy Activity - Settings for Big XML

I have a copy activity build in Azure Data Factory V2, where the datasource is a SFTP folder with several XML files and the Sink is a Azure Postgres Database.

I have successfully used the copy activity for small files (20 MB). But I have 3 major XML files with 3 GB, 4.5 GB and 18 GB.

Upvotes: 1

Views: 330

Answers (1)

Abhishek Khandave
Abhishek Khandave

Reputation: 3230

A single copy activity can take advantage of scalable compute resources.

When using Azure integration runtime (IR), you can specify up to 256 data integration units (DIUs) for each copy activity, in a serverless manner. When using self-hosted IR, you can take either of the following approaches:

  • Manually scale up the machine.

  • Scale out to multiple machines (up to 4 nodes), and a single copy activity will partition its file set across all nodes.

A single copy activity reads from and writes to the data store using multiple threads in parallel.

You can run multiple copy activities in parallel. You can run in parallel by using control flow constructs.

For more information, see the following articles about solution templates:

Copy files from multiple containers

Migrate data from Amazon S3 to ADLS Gen2

Bulk copy with a control table

Upvotes: 0

Related Questions