Reputation: 141
We have an on-premise Oracle database from which we need to extract data and store this in a Synapse dedicated pool. I have created a Synapse pipeline which first copies the data from Oracle to a datalake in a parquet file, which should then be imported into Synapse using a second copy task. The data from Oracle is extracted through a dynamically created query. This query has 2 hard-coded INT values which are generated at runtime. The query runs fine and the parquet file is created correctly, but if I use polybase or copy command to import the file to Synapse it fails with the following error:
"errorCode": "2200", "message": "ErrorCode=UserErrorSqlDWCopyCommandError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=SQL DW Copy Command operation failed with error 'HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ',Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,Source=.Net SqlClient Data Provider,SqlErrorNumber=106000,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=106000,State=1,Message=HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ClassCastException: ,},],'",
Bulk insert works but is less efficient on large quantities of data so I don't want to use that. The mapping for the copy activities is created dynamically based on the target database table definition. However, when I created a separate copy task and import the mapping to check what is going on, I noticed that the 2 INT columns are mapped as UTF8 on the parquet source side. The sink table is INT32. When I exclude both columns the copy task completes successfully. It seems that the copy activity fails because it cannot implicitly cast a string to an integer.
The 2 columns are explicitly cast as integers in the Oracle query that is the source for the parquet file.
SELECT t.*
, CAST(419 AS INT) AS "Execution_id"
, CAST(4832 AS INT) AS "Task_id"
, TO_DATE('2022-07-05 14:40:34', 'YYYY-MM-DD HH24:MI:SS') AS "ProcessedDTS"
, t.DEMUTDT AS "EffectiveDTS"
FROM CBO.DRKASTR t
WHERE DEMUTDT >= TO_DATE('2022-07-05 13:37:35', 'YYYY-MM-DD HH24:MI:SS');
Adding an explicit mapping for Oracle to parquet mapping them as INT also doesn't solve the problem. How do I prevent these 2 columns from being interpreted as integers instead of strings?!
Upvotes: 1
Views: 804
Reputation: 141
We ended up resolving this by first importing the data as strings in the database and casting to the correct database during further processing.
Upvotes: 0