humbleice
humbleice

Reputation: 906

Blob Storage Sink Timeout During Azure Data Factory Copy Activity

I am trying to backup Azure Table Storage tables to Azure Blob Storage using the Copy data activity in an Azure Data Factory pipeline.

This is mostly working fine, but for a few of my storage accounts, certain tables with a large volume of records are consistently failing. The error message in the pipeline is that "The client could not finish the operation within specified timeout."

I have tracked the issue down to intermittent spikes when reading Azure Table Storage; most requests during the backup process take sub-250ms to complete, but occasionally some take ~19000ms to complete. When these read spikes from Azure Table storage occur, the blob upload times out. These spikes occur consistently enough that I'm never able to finish a backup of these tables.

Is there a way to increase the sink timeout when writing to blob storage, or is there another configuration I can use to help these uploads succeed?

Here's an example of the error from the Pipeline run:

{
    "dataRead": 2073796386,
    "dataWritten": 211593105,
    "filesWritten": 12,
    "sourcePeakConnections": 1,
    "sinkPeakConnections": 1,
    "rowsRead": 1272000,
    "rowsCopied": 1272000,
    "copyDuration": 372,
    "throughput": 7302.1,
    "logFilePath": "backup-logs/copyactivity-logs/Copy to Blob Storage/163560cd-ec4b-4fba-86f2-c6a5ea8ce584/",
    "errors": [
        {
            "Code": 9011,
            "Message": "ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The file operation is failed, upload file failed at path: 'backups/20230908/activities/activities_00012.parquet'.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The client could not finish the operation within specified timeout.,Source=Microsoft.WindowsAzure.Storage,''Type=System.TimeoutException,Message=The client could not finish the operation within specified timeout.,Source=,'",
            "EventType": 0,
            "Category": 5,
            "Data": {},
            "MsgId": null,
            "ExceptionType": null,
            "Source": null,
            "StackTrace": null,
            "InnerEventInfos": []
        }
    ],
    "effectiveIntegrationRuntime": "PrivateNetworkIntegrationRuntime (North Central US)",
    "usedDataIntegrationUnits": 4,
    "billingReference": {
        "activityType": "DataMovement",
        "billableDuration": [
            {
                "meterType": "ManagedVNetIR",
                "duration": 0.4666666666666667,
                "unit": "DIUHours"
            }
        ]
    },
    "usedParallelCopies": 1,
    "executionDetails": [
        {
            "source": {
                "type": "AzureTableStorage"
            },
            "sink": {
                "type": "AzureBlobStorage"
            },
            "status": "Failed",
            "start": "9/8/2023, 10:21:57 AM",
            "duration": 372,
            "usedDataIntegrationUnits": 4,
            "usedParallelCopies": 1,
            "profile": {
                "queue": {
                    "status": "Completed",
                    "duration": 87
                },
                "transfer": {
                    "status": "Completed",
                    "duration": 284,
                    "details": {
                        "readingFromSource": {
                            "type": "AzureTableStorage",
                            "workingDuration": 274,
                            "timeToFirstByte": 0
                        },
                        "writingToSink": {
                            "type": "AzureBlobStorage",
                            "workingDuration": 4
                        }
                    }
                }
            },
            "detailedDurations": {
                "queuingDuration": 87,
                "timeToFirstByte": 0,
                "transferDuration": 284
            }
        }
    ],
    "dataConsistencyVerification": {
        "VerificationResult": "NotVerified"
    },
    "durationInQueue": {
        "integrationRuntimeQueue": 0
    }
}

The "General" timeout configuration for the Copy data activity is already set to 12 hours so this isn't a factor. The Copy activity would take less than an hour if it ran to completion (based on other storage accounts with the same tables that don't have this issue).

I've tried a few things to try to get the pipeline to succeed: I've run the backup of a single table in isolation to ensure it wasn't a load issue, but these also fail. I also adjusted "Maximum Data Integration Units" and "Degree of Copy Parallelism" to try to slow down the copy process (again to reduce load), but this also had no impact.

Upvotes: 0

Views: 701

Answers (1)

Pratik Lad
Pratik Lad

Reputation: 8291

AFAIK, there is no by default timeout to write the data in sink is 2 hrs. we can't increase it.

For larger tables (that is, tables with a volume of 100 GB or greater or that can't be migrated to Azure within two hours), we recommend that you partition the data

To resolve this situation, you can try below options to partition data.

Write particular set of rows in each file enter image description here

And after this using another copy activity try to merge the files in one, so it copies all the data in one file.

Upvotes: 0

Related Questions