Reputation: 87
I'm trying to bulk insert a file in SQL Server 2017 14.0.1000.169. I would like to take the file exactly as it arrives, save it to the desired location and then run the bulk insert query without having to modify the file at all. I'm having difficulties getting the script to recognize and ignore the double quotes in the text file unless I manually change the line endings from Unix to Windows. I have read quite a few topics here and outside of SO discussing subjects close to this one, alas, none of which gave me the answer to my question:
How do I bulk insert my file with Unix line endings and not end up with double quotes?
My file looks like this:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
"Dimension 1","Dimension 2","Dimension 3","Dimension 4","Dimension 5","Dimension 6","Dimension 7","Dimension 8","Dimension 9","Dimension 10","Dimension 11","Dimension 12","Dimension 13","Dimension 14","Dimension 15","Dimension 16","Dimension 17","Metric 1","Metric 2","Metric 3","Metric 4","Metric 5","Metric 6","Metric 7","Metric 8","Metric 9","Metric 10","Metric 11","Metric 12"
"string","string","date as string","string","string","string","string","string","string","string","string","string","string","string","string","string","string","bigint","bigint","decimal","decimal","decimal","bigint","decimal","decimal","bigint","decimal","bigint","bigint"
The query I'm using as follows:
DROP TABLE IF EXISTS Table
GO
CREATE TABLE [dbo].[Table](
[Dimension 1] [varchar] (255) NULL,
[Dimension 2] [varchar] (255) NULL,
[Dimension 3] [varchar] (255) NULL,
[Dimension 4] [varchar] (255) NULL,
[Dimension 5] [varchar] (255),
[Dimension 6] [varchar] (255) NULL,
[Dimension 7] [varchar] (255) NULL,
[Dimension 8] [varchar] (255) NULL,
[Dimension 9] [varchar] (1000) NULL,
[Dimension 10] [varchar] (255) NULL,
[Dimension 11] [varchar] (255) NULL,
[Dimension 12] [varchar] (255) NULL,
[Dimension 13] [varchar] (1000) NULL,
[Dimension 14] [varchar] (1000) NULL,
[Dimension 15] [varchar] (1000) NULL,
[Dimension 16] [varchar] (1000) NULL,
[Dimension 17] [varchar] (1000) NULL,
[Metric 1] [varchar] (50) NULL,
[Metric 2] [varchar] (50) NULL,
[Metric 3] [varchar] (50) NULL,
[Metric 4] [varchar] (50) NULL,
[Metric 5] [varchar] (50) NULL,
[Metric 6] [varchar] (50) NULL,
[Metric 7] [varchar] (50) NULL,
[Metric 8] [varchar] (50) NULL,
[Metric 9] [varchar] (50) NULL,
[Metric 10] [varchar] (255) NULL,
[Metric 11] [varchar] (50) NULL,
[Metric 12] [varchar] (50) NULL
) ON [PRIMARY]
GO
BULK
INSERT Table
FROM 'C:\Users\username\Folder\File.csv'
WITH
(
--FORMAT = 'CSV',
DATAFILETYPE = 'char',
FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\n',
ROWTERMINATOR = '0x0a',
FIRSTROW = 7,
--FIELDQUOTE = '"'
FIELDQUOTE = '0x22'
)
;
As you can see above, I'm importing everything as a varchar. Originally I only used this for one metric (due to data quality issues on supply end), as I fully intend to correct every blemish after the file has already been loaded. Having run into difficulties however I have set all metrics to varchar, so at least the file would load and I could see what it looks like and dig further.
So far I have tried the following:
leave the file untouched and run the above script with double-quote instead of 0x22 - this also works, but the end result is that every value is in double quotes
leave the file untouched and run the above script as it is (i.e. using 0x22 for the FIELDQUOTE) - again, works, but with double quotes everywhere
Every other thing I have tried so far resulted in various errors, which all lead back to the same two things: either I can't use FORMAT = 'CSV' (if I leave the Unix line endings in), or the moment I try loading metrics as float, it errors out because of the double-quotes.
I do have a workaround for the time being (I can remove the double quotes and convert the fields after the thing has loaded), I do wonder however whether I can integrate that step into the bulk insert (like I did when I loaded the file with Windows endings).
N.B. I am aware that FIELDQUOTE hasn't been around for too long, it should, however, apply to my build, as per Microsoft:
"FIELDQUOTE = 'field_quote' Applies to: SQL Server 2017 (14.x) CTP 1.1. Specifies a character that will be used as the quote character in the CSV file. If not specified, the quote character (") will be used as the quote character as defined in the RFC 4180 standard."
Did I forget to disclose anything? If not, any ideas what I might have overlooked?
Thanks in advance!
Upvotes: 2
Views: 4352
Reputation: 1
**Filename** = C:\Users\hp\Desktop\dataset\canvas_size.csv
**File format**: 'CSV'
**Data shown as follows**
size_id,width,height,label
20,20,,20" Long Edge
24,24,,24" Long Edge
30,30,,30" Long Edge
36,36,,36" Long Edge
40,40,,40" Long Edge
48,48,,48" Long Edge
56,56,,56" Long Edge
1522,15,22,15" x 22"(38 cm x 56 cm)
1618,16,18,16" x 18"(41 cm x 46 cm)
1620,16,20,16" x 20"(41 cm x 51 cm)
**create table script**
create table canvas_size(size_id int,width int,height int,label varchar(200));
**execute select query of the table, it shows no rows.**
select * from canvas_size;
**MS SQL server bulk load script**
BULK INSERT canvas_size FROM 'C:\Users\hp\Desktop\dataset\canvas_size.csv'
WITH (
FORMAT = 'CSV',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
);
**you will get below error**
Msg 4879, Level 16, State 1, Line 18
Bulk load failed due to invalid column value in CSV data file C:\Users\hp\Desktop\dataset\canvas_size.csv in row 2, column 4.
Msg 7399, Level 16, State 1, Line 18
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 18
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
**Solution:**
problem is if you look at the column name label data has double quotes, bulk load script what i wrote was not correct.
**Solution to modify the bulk load script**
BULK INSERT canvas_size FROM 'C:\Users\hp\Desktop\dataset\canvas_size.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2,
DATAFILETYPE = 'char'
);
data is successfully loaded into table (canvas_size)
select * from canvas_size;
size_id width height label
20 20 NULL 20" Long Edge
24 24 NULL 24" Long Edge
30 30 NULL 30" Long Edge
36 36 NULL 36" Long Edge
40 40 NULL 40" Long Edge
48 48 NULL 48" Long Edge
56 56 NULL 56" Long Edge
1522 15 22 15" x 22"(38 cm x 56 cm)
1618 16 18 16" x 18"(41 cm x 46 cm)
1620 16 20 16" x 20"(41 cm x 51 cm)
Upvotes: 0
Reputation: 95924
OK. The biggest problem here is your file. Firstly, the file does not RFC 4180, due to rows at the top. This makes for a headache.
Next is the important caveat on FIRSTROW
:
When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
Notice this says field terminators not row terminators. This is the second problem. For your data, you have this at the start:
"Report Name","Daily Extract (ID: 111111)"
"Date/Time Generated(UTC)","01-Mar-2020 15:08:51"
"Workspace Name","Company (ID: 22222)"
"Account Name","Client Account"
"Date Range","01-Jan-2019 - 29-Feb-2020"
<-- Blank Line -->
This is 6 field terminators, and 6 row terminators.
Next, you have more columns in the CSV file than in the table Table
. Table
doesn't have a column Dimension 17
.
After adding this missing column, I managed to get this working for the results I believe you are after with the below:
BULK INSERT [Table]
FROM '/tmp/YourFile2.txt'
WITH (FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2,
FORMAT = 'CSV',
FIELDQUOTE = '"');
This inserted 1 row into the table.
Upvotes: 1