Dave Od
Dave Od

Reputation: 3

Azure ADF sliceIdentifierColumnName is not populating correctly

I've set up a ADF pipeline using a sliceIdentifierColumnName which has worked well as it populated the field with a GUID as expected. Recently however this field stopped being populated, the refresh would work but the sliceIdentifierColumnName field would have a value of null, or occasionally the load would fail as it attempted to populate this field with a value of 1 which causes the slice load to fail.

This change occurred at a point in time, before it worked perfectly, after it repeatedly failed to populate the field correctly. I'm sure no changes were made to the Pipeline which caused this to suddenly fail. Any pointers where I should be looking?

Here an extract of the pipeline source, I'm reading from a table in Amazon Redshift and writing to an Azure SQL table.

    "activities": [
        {
            "type": "Copy",
            "typeProperties": {
                "source": {
                    "type": "RelationalSource",
                    "query": "$$Text.Format('select * from mytable where eventtime >= \\'{0:yyyy-MM-ddTHH:mm:ssZ}\\'   and   eventtime < \\'{1:yyyy-MM-ddTHH:mm:ssZ}\\'       ' , SliceStart, SliceEnd)"
                },
                "sink": {
                    "type": "SqlSink",
                    "sliceIdentifierColumnName": "ColumnForADFuseOnly",
                    "writeBatchSize": 0,
                    "writeBatchTimeout": "00:00:00"
                }
            },
            "inputs": [
                {
                    "name": "AmazonRedshiftSomeName"
                }
            ],
            "outputs": [
                {
                    "name": "AzureSQLDatasetSomeName"
                }
            ],
            "policy": {
                "timeout": "1.00:00:00",
                "concurrency": 10,
                "style": "StartOfInterval",
                "longRetry": 0,
                "longRetryInterval": "00:00:00"
            },
            "scheduler": {
                "frequency": "Hour",
                "interval": 2
            },
            "name": "Activity-somename2Hour"
        }
    ],

Also, here is the error output text

Copy activity encountered a user error at Sink:.database.windows.net side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'ColumnForADFuseOnly' contains an invalid value '1'.,Source=Microsoft.DataTransfer.Common,''Type=System.ArgumentException,Message=Type of value has a mismatch with column typeCouldn't store <1> in ColumnForADFuseOnly Column. Expected type is Byte[].,Source=System.Data,''Type=System.ArgumentException,Message=Type of value has a mismatch with column type,Source=System.Data,'.

Here is part of the source dataset, it's a table with all datatypes as Strings.

{
    "name": "AmazonRedshiftsomename_2hourly",
    "properties": {
        "structure": [
            {
                "name": "eventid",
                "type": "String"
            },
            {
                "name": "visitorid",
                "type": "String"
            },
            {
                "name": "eventtime",
                "type": "Datetime"
            }
}

Finally, the target table is identical to the source table, mapping each column name to its counterpart in Azure, with the exception of the additional column in Azure named [ColumnForADFuseOnly] binary NULL, It is this column which is now either being populated with NULLs or 1. thanks,

Upvotes: 0

Views: 229

Answers (1)

SQLBadPanda
SQLBadPanda

Reputation: 635

You need to define [ColumnForADFuseOnly] as binary(32), binary with no length modifier is defaulting to a length of 1 and thus truncating your sliceIdentifier...

When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30. See here

Upvotes: 0

Related Questions