bmsqldev
bmsqldev

Reputation: 2735

Error in SSIS Packages loading data into azure data warehouse

We have some ssis packages loading data into azure data warehouse from CSV files. All the data flow tasks inside the packages are configured for parallel processing. Recently packages are started failing with following error.

Failed to copy to SQL Data Warehouse from blob storage. 110802;An internal DMS error occurred that caused this operation to fail. Details: Exception: System.NullReferenceException, Message: Object reference not set to an instance of an object.

When we run the package manually (Running Each dft individually) its running fine. When we run the package manually as it is ( with parallel processing), same error occurs.

Anyone here please help to find the root-cause for this issue?

Upvotes: 1

Views: 379

Answers (2)

wBob
wBob

Reputation: 14379

IMHO it's a mistake to use SSIS Data Flow to insert data in Azure SQL Data Warehouse. There were problems with the drivers early on which made performance horrendously slow and even though these may now have been fixed, the optimal method for importing data into Azure SQL Data Warehouse is Polybase. Place your csv files into blob store or Data Lake, then reference those files using Polybase and external tables. Optionally then import the data into internal tables using CTAS, eg pseudocode

csv -> blob store -> polybase -> external table -> CTAS to internal table

If you must use SSIS, consider using only the Execute SQL task in more of an ELT-type approach or use the Azure SQL DW Upload Task which is part of the Azure Feature Pack for SSIS which is available from here.

Work through this tutorial for a closer look at this approach:

https://learn.microsoft.com/en-us/azure/sql-data-warehouse/design-elt-data-loading

Upvotes: 0

Prashant Mishra
Prashant Mishra

Reputation: 26

I believe this problem may occur if multiple jobs are trying to access the same file exactly at the same time. You may need to check if one CSV file is source for multiple SSIS packages, if yes, you may need to change your approach. When one package is trying to read one CSV file, it locks that file so that other job can't modify this file. To get rid of this problem, you can use sequential DFTs for those tasks that are using the same CSV as source and keep other DFTs in parallel as it is.

Upvotes: 1

Related Questions