Reputation: 1471
I am using Azure Data Factory from copying data from REST API to Azure Data Lake Store. Following is the JSON of my activity
{
"name": "CopyDataFromGraphAPI",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "HttpSource",
"httpRequestTimeout": "00:30:40"
},
"sink": {
"type": "AzureDataLakeStoreSink"
},
"enableStaging": false,
"cloudDataMovementUnits": 0,
"translator": {
"type": "TabularTranslator",
"columnMappings": "id: id, name: name, email: email, administrator: administrator"
}
},
"inputs": [
{
"referenceName": "MembersHttpFile",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "MembersDataLakeSink",
"type": "DatasetReference"
}
]
}
The REST API is created by me. First for testing purpose I am returning just 2500 rows and my Pipeline was working fine. It copied the data from REST API call to Azure Data Lake Store.
After testing I update the REST API and now It is returning 125000 rows. I tested that API in REST client and its working fine. But in Azure Data Factory's Copy Activity It is giving following error while copying data to Azure Data Lake Store.
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedToReadHttpFile,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to read data from http source file.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The remote server returned an error: (500) Internal Server Error.,Source=System,'",
"failureType": "UserError",
"target": "CopyDataFromGraphAPI"
}
The sink side is Azure Data Lake Store. Is there any limit of content size which I am copying from REST call to Azure Data Lake Store.
I also retested the pipeline by updating REST API call(2500 rows) and It worked fine and when I updated API Call and It returns 125000 rows. My pipeline starts giving the same above mentioned error.
My source DataSet in Copy Activity is
{
"name": "MembersHttpFile",
"properties": {
"linkedServiceName": {
"referenceName": "WM_GBS_LinikedService",
"type": "LinkedServiceReference"
},
"type": "HttpFile",
"structure": [
{
"name": "id",
"type": "String"
},
{
"name": "name",
"type": "String"
},
{
"name": "email",
"type": "String"
},
{
"name": "administrator",
"type": "Boolean"
}
],
"typeProperties": {
"format": {
"type": "JsonFormat",
"filePattern": "arrayOfObjects",
"jsonPathDefinition": {
"id": "$.['id']",
"name": "$.['name']",
"email": "$.['email']",
"administrator": "$.['administrator']"
}
},
"relativeUrl": "api/workplace/members",
"requestMethod": "Get"
}
}
}
Sink Data Set is
{
"name": "MembersDataLakeSink",
"properties": {
"linkedServiceName": {
"referenceName": "DataLakeLinkService",
"type": "LinkedServiceReference"
},
"type": "AzureDataLakeStoreFile",
"structure": [
{
"name": "id",
"type": "String"
},
{
"name": "name",
"type": "String"
},
{
"name": "email",
"type": "String"
},
{
"name": "administrator",
"type": "Boolean"
}
],
"typeProperties": {
"format": {
"type": "JsonFormat",
"filePattern": "arrayOfObjects",
"jsonPathDefinition": {
"id": "$.['id']",
"name": "$.['name']",
"email": "$.['email']",
"administrator": "$.['administrator']"
}
},
"fileName": "WorkplaceMembers.json",
"folderPath": "rawSources"
}
}
}
Upvotes: 0
Views: 4494
Reputation: 3209
As far as I know, there is no limit to file size. I've had a 10 gb csv with millions of rows and the data lake doesn't care.
What I can see is that while the error says "sink" side, the error code is UserErrorFailedToReadHttpFile so I think the issue may be solved if you change the httpRequestTimeout on your source, as of now it is "00:30:40" and maybe the row transferring is being interrupted because of it. 30 minutes is a lot of time for 2500 rows, but maybe 125k doesn't fit there.
Hope this helped!
Upvotes: 0