Dai
Dai

Reputation: 155145

SQL Server BULK INSERT - Inserting DateTime values

I have 6 million rows worth of data I want to insert into my SQL Server database. I can do it the slow way with 6 million INSERT statements (by my calculation it would take 18 hours to run) or I can try BULK INSERT.

BULK INSERT has issues with not being able to escape characters, but the data in this case is very simple and so shouldn't run into this problem.

However SQL Server doesn't seem to like any form of date/time data to be inserted into a field.

Here is the table (psuedo-SQL)

CREATE TABLE Tasks (
    TaskId bigint NOT NULL IDENTITY(1,1) PRIMARY KEY,
    TriggerId bigint NOT NULL FOREIGN KEY,
    Created datetime NOT NULL,
    Modified datetime NOT NULL,
    ScheduledFor datetime NULL,
    LastRan datetime NULL,
    -- and about 10 more fields after this
)

Here is my BULK INSERT statement:

SET DATEFORMAT dmy
BULK INSERT Tasks
FROM 'C:\TasksBulk.dat'
WITH (
    -- CHECK_CONSTRAINTS is not necessary as the only constraints are always enforced regardless of this option (UNIQUE, PRIMARY KEY, and NOT NULL)
    CODEPAGE = 'RAW',
    DATAFILETYPE = 'native',

    KEEPIDENTITY,
    MAXERRORS = 1,
    ORDER ( CallId ASC ),

    FIELDTERMINATOR = '\t',
    ROWTERMINATOR   = '\0'
)

And here is the first row of data in TasksBulk.dat:

1000\t1092\t01/01/2010 04:00:17\t01/01/2010 04:00:17\t\t01/01/2010 04:00:14\0

(For readability, reformatted with tab characters replaced with 4-spaces:)

1000    1092    01/01/2010 04:00:17    01/01/2010 04:00:17        01/01/2010 04:00:14\0

However when I run the BULK INSERT statement, I get this error:

Msg 4864, Level 16, State 1, Line 2 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (Created).

I have tried using different row and field terminators and every different date/time format (including "01/01/2010", "2010-01-01", both with and without the "04:00:17" time component). I don't know what I'm doing wrong here.

Upvotes: 7

Views: 35011

Answers (3)

meltdownmonk
meltdownmonk

Reputation: 493

A method I'm familiar with is to insert your dates in the form of an integer.

I use the number of seconds starting from a certain date (I use one over 10 years in the past, since there isn't any data I would be accessing or generating that is older than that)

The date 2012-01-02 12:15:10.000 would be stored as 378637886, using the reference point of January 1st, 2000.

When querying the database, you could have the column returned using DateAdd(SS, column_name, '2000-01-01').

You could do this in millieseconds too if that kind of precision was needed.

I use my own custom function to convert my time in seconds into whatever format I would like, and I use another custom function to turn dates back into seconds.

I realize this may not be a good method because it could require database changes and code changes for you, but perhaps it could be a solution concept that others would find useful.

Upvotes: 0

Dai
Dai

Reputation: 155145

It turns out that changing the DATAFILETYPE from 'native' to 'char' solved the problem. The 'native' type implies a strict data format for everything, whereas 'char' is meant for more plaintext files.

Upvotes: 6

Mitch Wheat
Mitch Wheat

Reputation: 300559

You have your CODDEPAGE set to RAW (presumably for speed).

The error message implies your data contains characters outside the codepage.

CODEPAGE [ = 'ACP' | 'OEM' | 'RAW' | 'code_page' ]

Specifies the code page of the data in the data file. CODEPAGE is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.

But that could be misleading. Your example data line contains a missing column. If you don't use a format file, every field in the table must be used.

So you could either create a format file or create a staging table with varchar(25) for the datetime columns, importing and then perform an update from the staging table into the destination table. That way you have more control over conversions and missing data.

Upvotes: 1

Related Questions