Alex Gordon
Alex Gordon

Reputation: 60882

How to bulk insert 20 100mb CSV files into SQL Server

I have about 20 .csv files which are around 100-200mb each.

They each have about 100 columns.

90% of the columns of each file are the same; however, some files have more columns and some files have less columns.

I need to import all of these files into one table in a sql server 2008 database.

If the field does not exist, I need it to be created.

question: What should be the process with this import? How do I more efficiently and quickly import all of these files into one table in a database, and make sure that if a field does not exist, then it is created? Please also keep in mind that the same field might be in a different location. For example, CAR can be in field AB in one csv whereas the same field name (CAR) can be AC in the other csv file. The solution can be SQL or C# or both.

Upvotes: 2

Views: 3040

Answers (8)

TehBoyan
TehBoyan

Reputation: 6900

There are several possibilities that you have here.

  • You can use SSIS if it is available to you.
  • In Sql Server you can use SqlBulkCopy to bulk insert in a staging table where you will insert the whole .csv file and then use a stored procedure with possibly MERGE statement in it to place each row where it belongs or create a new one if it doesn't exist.
  • You can use C# code to read the files and write them using SqlBulkInsert or EntityDataReader

Upvotes: 1

Mark Canlas
Mark Canlas

Reputation: 9583

Roll your own.

Keep (or create) a runtime representation of the target table's columns in the database. Before importing each file, check to see if the column exists already. If it doesn't, run the appropriate ALTER statement. Then import the file.

The actual import process can and probably should be done by BCP or whatever Bulk protocol you have available. You will have to do some fancy kajiggering since the source data and destination align only logically, not physically. So you will need BCP format files.

Upvotes: 1

Romias
Romias

Reputation: 14133

For those data volumes, you should use an ETL. See this tutorial.

ETLs are designed for large amount of data manipulation

Upvotes: 0

Bengie
Bengie

Reputation: 1035

Less of an answer and more of a direction, but here I go. The way I would do it is first enumerate the column names from both the CSV files and the DB, then make sure the ones from your CSV all exist in the destination.

Once you have validated and/or created all the columns, then you can do your bulk insert. Assuming you don't have multiple imports happening at the same time, you could cache the column names from the DB when you start the import, as they shouldn't be changing.

If you will have multiple imports running at the same time, then you will need to make sure you have a full table lock during the import, as race conditions could show up.

I do a lot of automated imports for SQL DBs, and I haven't ever seen what you asked, as it's an assumed requirement that one knows the data that is coming in to the DB. Not knowing columns ahead of time is typically a very bad thing, but it sounds like you have an exception to the rule.

Upvotes: 1

Darshan
Darshan

Reputation: 121

Use SqlBulkCopy class in System.Data.SqlClient

It facilitates bulk data transfer. only catch it wont work with DataTime DB column

Upvotes: 1

JohnD
JohnD

Reputation: 14787

I would recommend looking at the BCP program which comes with SQL Server and is intended to help with jobs just like this:

http://msdn.microsoft.com/en-us/library/aa337544.aspx

There are "format files" which allow you to specify which CSV columns go to which SQL columns.

If you are more inclined to use C#, have a look at the SqlBulkCopy class:

http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

Also take a look at this SO thread, also about importing from CSV files into SQL Server:

SQL Bulk import from CSV

Upvotes: 3

Oleg Dok
Oleg Dok

Reputation: 21776

You may choose a number of options 1. Use the DTS package 2. Try to produce one uniform CSV file, get the db table in sync with its columns and bulk insert it 3. Bulk insert every file to its own table, and after that merge the tables into the target table.

Upvotes: 3

Andrew Hanlon
Andrew Hanlon

Reputation: 7451

I recommend writing a small c# application that reads each of the CSV file headers and stores a dictionary of the columns needed and either outputs a 'create table' statement or directly runs a create table operation on the database. Then you can use Sql Management Studio to load the 20 files individually using the import routine.

Upvotes: 1

Related Questions