Reputation: 3025
I have the following data file in CSV format - https://www.dropbox.com/s/23dtr31pje15baz/yellow-dentist-chicago%2C%20il.csv?dl=0
A few of the rows have data which is enclosed in quotation marks to indicate that the commas within the data are part of the data and not a delimiter (see "Bruce C. Gronner, DDS" in the 5th row for the first example). As such, it is my understanding that I will need to use a format file as discussed in this post.
As a start, I have come up with this
11.0
13
1 SQLCHAR 0 1000 "\"," 1 Name SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 1000 "\"," 2 Address SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 1000 "\"," 3 State SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 1000 "\"," 4 Phone SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 1000 "\"," 5 Rating SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 1000 "\"," 6 Reviews SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 1000 "\"," 7 Website SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 1000 "\"," 8 Email1 SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 1000 "\"," 9 MerchantVerified SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 1000 "\"," 10 Lat SQL_Latin1_General_CP1_CI_AS
12 SQLCHAR 0 1000 "\"," 11 Long SQL_Latin1_General_CP1_CI_AS
12 SQLCHAR 0 1000 "\"," 12 ListingURL SQL_Latin1_General_CP1_CI_AS
13 SQLCHAR 0 1000 "\r\n" 13 Email2 SQL_Latin1_General_CP1_CI_AS
I am then trying to use the following SQL code to import
BULK INSERT RawData
FROM 'C:\Users\William\Dropbox\yellow-dentist-chicago, il'
WITH
(
FIRSTROW = 2,
FORMATFILE='C:\Users\William\Dropbox\formatfile.fmt'
);
The primary issue I am having is that only a small subset of the data is enclosed in quotation marks (only when a comma is present in the data) so I don't know how to format the delimiter pattern column appropriately in the format file. Any help would be appreciated.
Upvotes: 1
Views: 2901
Reputation: 1562
You need minor changes to your format file.
To limit amount of data, here is the data I have the format for:
Name, Address, Email
Somename,"Address one",[email protected]
Somename2,"Address2 Two",[email protected]
As you can see the Address column is using double quotes and the Name and Email does not.
The format to use would be:
10.0
3
1 SQLCHAR 0 50 ",\"" 1 Name SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 "\"," 2 Address SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 50 "\r\n" 3 Email2 SQL_Latin1_General_CP1_CI_AS
The Name colume delimiter would be ,"
and in the format file would be ",\""
The Address colume delimiter would be ",
and in the format file would be `"\","
Upvotes: 2
Reputation: 63
The quickest way i can think of doing this (there's other ways I'm sure) you could format the csv in excel and add a unique symbol as a field terminator (ill use vertical bar in this case) to the start and end of each cell see this link here on how to do it:
http://www.lenashore.com/2012/04/how-to-add-quotes-to-your-cells-in-excel-automatically/
You will want to added |@| as the custom format for the cells instead of quotes.
Now you should be able to run the BULK Insert like this to import the data:
BULK INSERT RequiredDB.DBO.RequiredTable
FROM '%Drive%\%Folder%\yellow-dentist-chicago-il.csv'
WITH
(
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\r\n'
);
Also i presume that's dummy data or public information that you posted in the csv and not confidential or private data.
Upvotes: 0