Reputation: 349
I'm working with a double pipe delimited file which I want to ingest in Azure SQL DB via Data Factory (2):
Column1||Column2||Column3
In my input dataset, I specified the double pipe delimiter as columnDelimiter in the format section of typeProperties:
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "||",
"skipLineCount": 2
}
Unfortunately, this is not working. All rows are marked as "incompatible" and no data is ingested in my SQL table. Next, I tried a few things. When not specifying any columnDelimiter, the complete row is ingested in the first column of my SQL table, so no delimiter is considered at all. This is not correct, logically each data item must be ingested in a separate column. Scripting the ||
in Unicode also did not fix it. When converting my input file to single pipe delimited it works fine. However, in the end-state I'm not able to alter the input file so handling the double pipe delimiter is essential.
Upvotes: 2
Views: 3395
Reputation: 14379
You could either ask the creators of your input file to use a single delimiter (like most people do) or else you have to do some pre-processing. For example, you could this with a U-SQL step which corrects the file and then import that file into your SQL DB, eg some simple U-SQL:
// Multiple column delimiters
@input =
EXTRACT
col1 string
FROM "/input/input91.txt"
USING Extractors.Text(delimiter:'~', skipFirstNRows:1);
// Do some other processing here?
@output = SELECT col1.Replace("||", "|") AS col2 FROM @input;
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Text(delimiter : '|', quoting : false);
NB Here I've used a delimiter (tilde "~") which I know won't occur in the input file to effectively import all the columns as one column, and then fix it up.
Is there a particular reason you use two delimiters?
Also, if your file is on blob storage, then you can now BULK INSERT
into Azure SQL DB, and having just given this a quick test, BULK INSERT can cope with multiple delimiters, eg
DROP TABLE IF EXISTS #tmp
CREATE TABLE #tmp (
a VARCHAR(50),
b VARCHAR(50),
c VARCHAR(50),
)
GO
BULK INSERT #tmp
FROM 'D:\Data Lake\USQLDataRoot\input\input91.txt'
WITH ( FIELDTERMINATOR = '||', DATA_SOURCE = '<yourDataSource>' );
See here for more info on the steps which involve creating an external data source and credential. You could then get Data Factory to use the Stored Proc task to execute for example.
Upvotes: 1
Reputation: 23782
Based on the text format allowed rules , only one character is allowed. I even want to use escape delimiter to avoid it, but it also be banned.
So, it seems you need to process your data before copying activity.Since ADF supports copy data from HTTP endpoint,so I suggest you following steps:
2.Replace all the '||' with '|' and set it in the response.(Don't forget '\n')
3.ADF set column delimiter to '|' and copy data from HTTP endpoint.
This solution could process big size data , of course, if you do not want to do such complicate work , you could process your data directly and save it as temporary file for ADF.
Hope it helps you.
Upvotes: 1