Reputation: 525
We have a dataflow that worked fine for the last 30 days, and failed last night. The error message from the pipeline reads:
{
"StatusCode": "DFExecutorUserError",
"Message": "Job failed due to reason: The given value of type NVARCHAR(255) from the data source cannot be converted to type nvarchar(100) of the specified target column.",
"Details": "shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: The given value of type NVARCHAR(255) from the data source cannot be converted to type nvarchar(100) of the specified target column.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.validateStringBinaryLengths(SQLServerBulkCopy.java:1690)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeColumn(SQLServerBulkCopy.java:3006)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeBatchData(SQLServerBulkCopy.java:3647)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1566)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$300(SQLServerBulkCopy.java:65)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:663)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jd"
}
The error message is not very precise, it does not list the column that is failing to meet the conditions. This makes the troubleshooting process a bit more difficult.
Both destination and source tables are hosted in the same Azure SQL Database. The destination table contains two columns with the length 100. The maximum length of the source fields is varchar(280) and varchar(290). The actual maximum length of the data inside these columns are 30 and 40 characters long.
--Source table
CREATE TABLE [pq].[USRS](
[USRS_ID] [varchar](100) NULL,
[USRS_FULL_NAME] [varchar](280) NULL,
/* STRIPPED NON-RELEVANT COLUMNS */
[USRS_EMAIL_ADDRESS] [varchar](290) NULL
/* STRIPPED NON-RELEVANT COLUMNS */
) ON [PRIMARY]
GO
--Destination table
CREATE TABLE [fin].[l20_dim_gebruiker](
[id] [int] IDENTITY(1,1) NOT NULL,
[gebruiker_id] [int] NOT NULL,
[volledige_naam] [varchar](100) NULL,
[emailadres] [varchar](100) NULL,
/* STRIPPED NON-RELEVANT COLUMNS */
CONSTRAINT [PK__l20_dim___3213E83F139E41DB] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
What could be causing this error message? How could we make the dataflow transformation more robust?
Upvotes: 1
Views: 1128
Reputation: 525
Ok, so the error was on my side. The check that I did on the source table was not fully conclusive.
Select max(len(USRS_FULL_NAME))
, max(datalength(USRS_FULL_NAME))
From pq.USRS
Result from the query above:
+--------------------+---------------------------+
| USRS_FULL_NAME_LEN | USRS_FULL_NAME_DATALENGTH |
+--------------------+---------------------------+
| 30 | 250 |
+--------------------+---------------------------+
The LEN(GTH) function in T-SQL does not count the trailing spaces, see MSFT Docs or this workaround on SO.
My fix is now to trim the nvarchar fields using a DerivedColum modifier.
Upvotes: 1