zgall1
zgall1

Reputation: 3025

Bulk insert to SQL Server using a text qualifer in a format file

I have the following data file in CSV format - https://www.dropbox.com/s/23dtr31pje15baz/yellow-dentist-chicago%2C%20il.csv?dl=0

A few of the rows have data which is enclosed in quotation marks to indicate that the commas within the data are part of the data and not a delimiter (see "Bruce C. Gronner, DDS" in the 5th row for the first example). As such, it is my understanding that I will need to use a format file as discussed in this post.

As a start, I have come up with this

11.0
13
1   SQLCHAR   0  1000   "\","      1     Name         SQL_Latin1_General_CP1_CI_AS
2   SQLCHAR   0  1000   "\","       2     Address         SQL_Latin1_General_CP1_CI_AS
3   SQLCHAR   0  1000   "\","       3     State     SQL_Latin1_General_CP1_CI_AS
4   SQLCHAR   0  1000   "\","       4     Phone     SQL_Latin1_General_CP1_CI_AS
5   SQLCHAR   0  1000   "\","       5     Rating         SQL_Latin1_General_CP1_CI_AS
6   SQLCHAR   0  1000   "\","       6     Reviews         SQL_Latin1_General_CP1_CI_AS
7   SQLCHAR   0  1000   "\","       7     Website         SQL_Latin1_General_CP1_CI_AS
8   SQLCHAR   0  1000   "\","       8     Email1         SQL_Latin1_General_CP1_CI_AS
9   SQLCHAR   0  1000   "\","       9     MerchantVerified         SQL_Latin1_General_CP1_CI_AS
10   SQLCHAR   0  1000   "\","       10     Lat         SQL_Latin1_General_CP1_CI_AS
12   SQLCHAR   0  1000   "\","       11     Long         SQL_Latin1_General_CP1_CI_AS
12   SQLCHAR   0  1000   "\","       12     ListingURL         SQL_Latin1_General_CP1_CI_AS
13   SQLCHAR   0  1000   "\r\n"      13     Email2         SQL_Latin1_General_CP1_CI_AS

I am then trying to use the following SQL code to import

BULK INSERT RawData
FROM 'C:\Users\William\Dropbox\yellow-dentist-chicago, il'
WITH 
(
  FIRSTROW = 2,
  FORMATFILE='C:\Users\William\Dropbox\formatfile.fmt'
);

The primary issue I am having is that only a small subset of the data is enclosed in quotation marks (only when a comma is present in the data) so I don't know how to format the delimiter pattern column appropriately in the format file. Any help would be appreciated.

Upvotes: 1

Views: 2901

Answers (2)

Tak
Tak

Reputation: 1562

You need minor changes to your format file.

To limit amount of data, here is the data I have the format for:

Name, Address, Email Somename,"Address one",[email protected] Somename2,"Address2 Two",[email protected]

As you can see the Address column is using double quotes and the Name and Email does not.

The format to use would be:

10.0
3
1   SQLCHAR   0  50    ",\""      1     Name         SQL_Latin1_General_CP1_CI_AS
2   SQLCHAR   0  50    "\","      2     Address      SQL_Latin1_General_CP1_CI_AS
3   SQLCHAR   0  50    "\r\n"     3     Email2       SQL_Latin1_General_CP1_CI_AS

The Name colume delimiter would be ," and in the format file would be ",\""

The Address colume delimiter would be ", and in the format file would be `"\","

Upvotes: 2

Tim Partridge
Tim Partridge

Reputation: 63

The quickest way i can think of doing this (there's other ways I'm sure) you could format the csv in excel and add a unique symbol as a field terminator (ill use vertical bar in this case) to the start and end of each cell see this link here on how to do it:

http://www.lenashore.com/2012/04/how-to-add-quotes-to-your-cells-in-excel-automatically/

You will want to added |@| as the custom format for the cells instead of quotes.

Now you should be able to run the BULK Insert like this to import the data:

BULK INSERT RequiredDB.DBO.RequiredTable
FROM '%Drive%\%Folder%\yellow-dentist-chicago-il.csv'
WITH
   (
   FIELDTERMINATOR = '|',
   ROWTERMINATOR = '\r\n'
   );

Also i presume that's dummy data or public information that you posted in the csv and not confidential or private data.

Upvotes: 0

Related Questions