Reputation: 1984
I have a lot of flat (text) files that I want to import daily into SQLSERVER tables. For now, while I work out my procedures, I just want to import a single file. Of course I can write c# code to do this, but I feel this is not the right way to go and I would like to use something like bulk insert with xml format files.
My first sample file looks something like this (sample.dat):
Q RR201110010000000002000000000000232000
N X4201110010000000001500000000000160000
Note: A hexdump on this file reveals that each line is terminated with exactly a newline character - nothing more or less.
My xml translation file looks like this:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="6"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="2"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="8"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="14"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="14"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="c1" xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="2" NAME="c2" xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="3" NAME="c3" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="4" NAME="c4" xsi:type="SQLINT" />
<COLUMN SOURCE="5" NAME="c5" xsi:type="SQLINT" />
</ROW>
</BCPFORMAT>
My query looks like this:
SET LANGUAGE us_english;
GO
SET DATEFORMAT ymd;
go
BULK INSERT
PROJ.dbo.Costs
FROM 'C:\somewhere\test01\SAMPLE.DAT'
WITH
(
DATAFILETYPE ='CHAR',
FORMATFILE='C:\somewhere\test01\TRANSLATE02.XML',
ERRORFILE='C:\somewhere\test01\ERRORS.TXT',
ROWTERMINATOR='\n'
)
GO
When I run this script, I get an overflow error starting with row 2. (That is, row 1 appears to have translated correctly, although I do not see it in the sql table.) A hexdump of ERRORS.TXT reveals that the first error line (row 2) begins with the newline! Of course, that would cause the 4th field to overflow! So it appears the script is not understanding the ROWTERMINATOR. I tried '\n', '\r', '\r\n', '\n\r' just in case it wasn't seeing the \r. No avail.
I also attempted a slightly different sql command as per bulk insert txt error with ROWTERMINATOR
and got the same error.
Any thoughts on what I'm missing?
As requested, here's a munged hex dump of sample.dat:
000000: 41 20 20 20 20 20 XX XX 32 30 31 31 31 30 30 31 Q RR20111001
000010: 30 30 30 30 30 30 30 30 30 31 35 30 30 30 30 30 0000000001500000
000020: 30 30 30 30 30 30 31 35 30 30 30 30 0A ZZ 20 20 000000150000.N
000030: 20 20 20 XX XX 32 30 31 31 31 30 30 31 30 30 30 X420111001000
000040: 30 30 30 30 30 30 32 30 30 30 30 30 30 30 30 30 0000002000000000
000050: 30 30 30 32 33 32 30 30 30 0A ZZ 20 20 20 20 20 000232000.Y
note that XX and ZZ are masked (not real data) the 0A is the linefeed character and it's the only thing between the last zero (hex 30) and the ZZ character that starts the next line. Hopefully this is not too confusing.
My solution below works, however, this problem is also discussed here and the solution seems better to me (though I haven't confirmed it, I think I'll try it with the next file). Bulk insert rowterminator issue
Upvotes: 2
Views: 5248
Reputation: 1097
The SQL Server article on XML schema files for fixed-format text has the terminator specified in the RECORD element:
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="10"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="6"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\r\n"
</RECORD>
(Beware the typo in the above.) Perhaps this is why your specification in the query is ignored.
Upvotes: 1
Reputation: 1984
The answer (well, at least one answer) is embarrassingly simple.
I just added a bogus, one character field to the list of FIELDS in the XML.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="6"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="2"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="8"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="14"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="14"/>
<FIELD ID="6" xsi:type="CharFixed" LENGTH="1"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="c1" xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="2" NAME="c2" xsi:type="SQLNCHAR"/>
<COLUMN SOURCE="3" NAME="c3" xsi:type="SQLCHAR"/>
<COLUMN SOURCE="4" NAME="c4" xsi:type="SQLINT" />
<COLUMN SOURCE="5" NAME="c5" xsi:type="SQLINT" />
</ROW>
</BCPFORMAT>
Note I did not write that last field (with a corresponding COLUMN tag). This reads the EOL (/n) into a dummy field. If this is not a bug in the behavior of ROWTERMINATOR in the BULK INSERT command, then it is at least highly unintuitive. That is, the ROWTERMINATOR appears to be a NOOP.
Observation 1: While column 3 is a date in YYYYMMDD format, the corresponding SOURCE 3 is actually a SMALLDATETIME. It automatically converts it correctly.
Observation 2: SOURCES 3 and 4 are defined as decimal(14, 2). I thought that would scale the inputs to use the last 2 chars in the corresponding fields as hundredths. I can either a. find a way to scale automatically (preferred) or b. do post-processing to divide by 100. (That's another question - just noting it here, as it seems interesting to me.)
Either way, this appears to be ONE solution for the problem. Thanks for the responses.
Addendum (as an aside): I have decided to go with option b (as noted in Observation 2), using an UPDATE SET command at the end of the sql commands to divide the money fields by 100.
The final product will be a batch file that invokes "sqlcmd" a number of times - and then runs a perl script at the tail end to check for entries in the various error files.
One additional thing: I note that the error file listed in the BULK INSERT command have to be non-existent when I run this; otherwise, that itself will produce a different error! I'll take care of that during the pre-processing.
Anyway, thanks again.
Upvotes: 1
Reputation: 11571
You Must Use SSIS "SQL Server Integration Services" For Convert Data From File To Your Database. and You Can Job This Convert In SQL Server for Convert Each Day Automatically.
Upvotes: 1
Reputation: 5234
Try char(13)
, it's the SQL carriage return. Also char(13) + char(10)
, carriage return/line feed.
Upvotes: 0