Reputation: 65
Situation: Every day a bunch of JSON files are generated and put into Azure BLOB storage. Also every day an Azure data factory copy jobs makes a look up in the blob storage and does a "Filter by last modified":
Start time: @adddays(utcnow(),-2)
End time: @utcnow()
The files are copied to Azure Datalake Gen2.
On normal days with 50-100 new JSON-files the copy jobs goes fine but at the last day of every quarter the number of new JSON-files increases to 10.000+ files and then the copy job fails with the message "ErrorCode=SystemErrorFailToInsertSubJobForTooLargePayload,….."
Therefore I have made a new copy job that uses a for each loop to run parallel copy jobs. This can copy much larger volumes of files, but it still takes a couple of minutes per file and I have not seen more than around 500 files per hour being copied, so that is still not fast enough.
Therefore I am searching for more ways to optimize the copy. I have put in a couple of screen shots but can give more details on specifics.
Upvotes: 1
Views: 575
Reputation: 4544
The issue is with the size of payload which is unable to process using the current configuration (expecting you are using default settings).
You can optimize the Copy activity performance by considering the underlying changes in your Azure Data Factory (ADF) environment.
You can try these Performance Tuning Steps in your ADF to increase the performance.
Configure the copy optimization features in settings tab.
Refer Copy activity performance optimization features for more details and better understanding.
Upvotes: 1