pzien
pzien

Reputation: 1

Azure Data Factory cache sink contains only a subset of data

In data flow that includes cache sink only a subset of data seems to be available as an output for the next activity, although it seems that sink activity itself reads all the records Sink contains all the records

Somehow when run as dataflow activity the number of rows in sink output is 145 not 1120. What can be a reason?

I have tried to use a single partition but it did not affect the result. Also, it is the same number of records for the bigger data set. Is there some kind of limit?

Definition of dataflow activity:

{
"name": "Data flow1",
"type": "ExecuteDataFlow",
"dependsOn": [],
"policy": {
    "timeout": "0.12:00:00",
    "retry": 0,
    "retryIntervalInSeconds": 30,
    "secureOutput": false,
    "secureInput": false
},
"userProperties": [],
"typeProperties": {
    "dataflow": {
        "referenceName": "create_batch_dataflow",
        "type": "DataFlowReference"
    },
    "compute": {
        "coreCount": 8,
        "computeType": "General"
    },
    "traceLevel": "None",
    "cacheSinks": {
        "firstRowOnly": false
    }
}

}

Upvotes: 0

Views: 312

Answers (1)

pzien
pzien

Reputation: 1

I guess the output dataset is exceeding payload limitations of the next activity input and then it gets truncated.

Upvotes: 0

Related Questions