Reputation: 2219
I am moving data within folder from Azure Data Lake to a SQL Server using Azure Data Factory (ADF).
The folder contains hundreds of .csv files. However, one inconsistent problem with these csv's is that some (not all) have a final row that contains a special character, which when trying to load to a sql table with datatypes other than NVARCHAR(MAX) will fail. To get around this, I have to first use ADF to load the data into staging tables where all columns are set to NVARCHAR(MAX), then I insert those rows that do not contain a special character into tables that have the appropriate data type.
This is a weekly process, and is over a terabyte of data and it takes forever to move the data so I am looking into ways to import into my final tables rather than having a staging component.
I notice that there is a 'pre-copy script' field that can execute before the load to sql server. I want to add code that will allow me to parse out special characters OR null rows before loading to sql server.
I am unsure of how to approach this since the csv's would not be stored in a table, so SQL code wouldn't work. Any guidance on how I can utilize the pre-copy script to clean my data before loading it into sql server?
Upvotes: 1
Views: 8730
Reputation: 2363
You could consider stored procedure. https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#invoking-stored-procedure-for-sql-sink
Upvotes: 0
Reputation: 3209
The pre-copy script is a script that you run against the database before copying new data in, not to modify the data you are ingesting.
I already answered this on another question, providing a possible solution using an intermediate table: Pre-copy script in data factory or on the fly data processing
Hope this helped!
Upvotes: 1