ERock
ERock

Reputation: 21

SQL Bulk Insert Error because Last Field Contains Carriage Returns

thanks in advance for your help.

I am trying to import a flat file into my SQL server database. I am getting truncation errors. The problem I'm having (I think) is the last field of the file is a free form text field which appears to allow carriage returns within the field. So I end up having {CR}{LF} as a Row Terminator, but sometimes the {CR}{LF} is really just a new line within the free form text in the last field.

Does anyone have any idea how to load this data in? Thanks for your help.

Here's my code:

DROP TABLE   #SHELL
CREATE TABLE #SHELL
(
Field1 varchar(50)
,Field2 varchar(50)
,Field3 varchar(50)
,Field4 varchar(50)
,Field5 varchar(50)
,Field6 varchar(160)
,Field7 varchar(50)
,Field8 varchar(50)
,COMMENTS nvarchar(MAX)
 )

BULK INSERT #SHELL
FROM 
'[thefilepath].txt'
WITH
(
FIRSTROW = 1,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)

Here's a picture of the table as viewed in Notepad++ so you can see the carriage returns and line feeds.

Table Sample

Upvotes: 0

Views: 2459

Answers (3)

onupdatecascade
onupdatecascade

Reputation: 3366

If you want to be deliberate about it, I'd pre-process the text file to remove those "mid field" line terminators, or, if they are important to keep, substitute another escape character for them.

For example, create a script that takes each line in the file and attempts to split it on the column delimiter, and compare how many tokens you get back - to determine if a line is really a complete line with all expected columns.

For any line that has fewer tokens than expected, merge it with the previous line, optionally inserting your escape character into that last column.

Write the results out to a new file, so every DB record is truly one line.

After import, optionally, use an update statement to put the line terminators back into that last field. Example, if you replaced them with, say, a | then update to replace | with \n in SQL.

Example: "records" in the file might be:

value value1 value2 some long text
wraps to another line and
yet another line
value value1 value2 a better behaved record on one line

That's two records, but four lines. In preprocessing the file, make it:

value value1 value2 some long text|wraps to another line and|yet another line
value value1 value2 a better behaved record on one line

Then the records are actually one per line in the file, and every CRLF is the end of a line AND the end of a record. Bulk import works, but the value of the last column will be

some long text|wraps to another line and|yet another line

Perform an update replacing | by CRLF in SQL to return to an accurate representation of what was in the file

Upvotes: 1

chezy525
chezy525

Reputation: 4174

Bulk insert alone can't really handle this. So, there are two options:

  1. Pre-process the file so that each record is only on a single line. Basically, you'll need to replace each mid-field CRLF with some other character/string.
  2. Load the file into a temp table of a single data column and then use SQL to combine the mid-field CRLF and eventually get to your final table. NOTE: SQL is generally really bad at this (splitting strings, etc), you can use some libraries to help, but personally, I'd try to avoid this solution.

Upvotes: 0

S3S
S3S

Reputation: 25142

Your image didn't come through, but when the export or file came from a LINUX system often you need to use HEX. Try this:

WITH (FIELDTERMINATOR = '\t',FIRSTROW=1,ROWTERMINATOR = '0x0a')

Upvotes: 1

Related Questions