Phillip Deneka
Phillip Deneka

Reputation: 241

Importing CSV with commas in data field

I am importing several large CSV files. In the process, I discovered two important pieces of information:

  1. Certain data fields contain common delimiters (commas, pipes, and tabs);
  2. and I found the CSV is actually several CSVs merged together by a row without data.

Upvotes: 0

Views: 1521

Answers (1)

Phillip Deneka
Phillip Deneka

Reputation: 241

Seeing as this question has come up many times with several different answers, and few of the answers are limited to SQL query only, so I am writing the following:

 CREATE TABLE #table1
 (
  data varchar(max)
 )

 BULK
 INSERT table1
 FROM 'C:\Import\file.csv'
 WITH
 (
      FIRSTROW = 1,
      FIELDTERMINATOR = 'somethingthatdoesntexist',
      ROWTERMINATOR = '0x0A'
 ) 
 GO

 ALTER TABLE table1 ADD id numeric NOT NULL IDENTITY(1,1)

 ALTER TABLE report_table ADD CONSTRAINT
      pk_id PRIMARY KEY CLUSTERED
      (
           id
      ) WITH (
           statistics_norecompute = off,
           ignore_dup_key = off,
           allow_row_locks = on,
           allow_page_locks = on
 )
 ON [PRIMARY]

From there, you can use an appropriate ideology to construct where your headers are. The first row (or first X rows) should be headers, and then the first row(s) after an empty row. Then check for an existing table that matches or create a new table, and import that data until the next empty row. Lather, rinse, repeat.

 -- Create Processed(FALSE).  Set to TRUE when properly imported.
 ALTER TABLE table1 ADD processed bit NOT NULL
      CONSTRAINT c_is_processed_default_no DEFAULT FALSE

 -- Create Flagged(FALSE).  Set to TRUE when manual review required.
 --     Final step: set Flag = 1 where processed = 0; SELECT * FROM table1 WHERE flag = 1
 ALTER TABLE table1 ADD flag bit NOT NULL
      CONSTRAINT c_manual_review_default_no DEFAULT FALSE

Fair warning, the CSV I am importing is quite large, so I'm having to work through memory errors. Currently, I am working around this by executing each step in a separate batch.

Upvotes: 1

Related Questions