Reputation: 60882
I have about 20 .csv files which are around 100-200mb each.
They each have about 100 columns.
90% of the columns of each file are the same; however, some files have more columns and some files have less columns.
I need to import all of these files into one table in a sql server 2008 database.
If the field does not exist, I need it to be created.
question: What should be the process with this import? How do I more efficiently and quickly import all of these files into one table in a database, and make sure that if a field does not exist, then it is created? Please also keep in mind that the same field might be in a different location. For example, CAR can be in field AB in one csv whereas the same field name (CAR) can be AC in the other csv file. The solution can be SQL or C# or both.
Upvotes: 2
Views: 3040
Reputation: 6900
There are several possibilities that you have here.
Upvotes: 1
Reputation: 9583
Keep (or create) a runtime representation of the target table's columns in the database. Before importing each file, check to see if the column exists already. If it doesn't, run the appropriate ALTER statement. Then import the file.
The actual import process can and probably should be done by BCP or whatever Bulk protocol you have available. You will have to do some fancy kajiggering since the source data and destination align only logically, not physically. So you will need BCP format files.
Upvotes: 1
Reputation: 14133
For those data volumes, you should use an ETL. See this tutorial.
ETLs are designed for large amount of data manipulation
Upvotes: 0
Reputation: 1035
Less of an answer and more of a direction, but here I go. The way I would do it is first enumerate the column names from both the CSV files and the DB, then make sure the ones from your CSV all exist in the destination.
Once you have validated and/or created all the columns, then you can do your bulk insert. Assuming you don't have multiple imports happening at the same time, you could cache the column names from the DB when you start the import, as they shouldn't be changing.
If you will have multiple imports running at the same time, then you will need to make sure you have a full table lock during the import, as race conditions could show up.
I do a lot of automated imports for SQL DBs, and I haven't ever seen what you asked, as it's an assumed requirement that one knows the data that is coming in to the DB. Not knowing columns ahead of time is typically a very bad thing, but it sounds like you have an exception to the rule.
Upvotes: 1
Reputation: 121
Use SqlBulkCopy class in System.Data.SqlClient
It facilitates bulk data transfer. only catch it wont work with DataTime DB column
Upvotes: 1
Reputation: 14787
I would recommend looking at the BCP
program which comes with SQL Server and is intended to help with jobs just like this:
http://msdn.microsoft.com/en-us/library/aa337544.aspx
There are "format files" which allow you to specify which CSV columns go to which SQL columns.
If you are more inclined to use C#, have a look at the SqlBulkCopy
class:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
Also take a look at this SO thread, also about importing from CSV files into SQL Server:
Upvotes: 3
Reputation: 21776
You may choose a number of options 1. Use the DTS package 2. Try to produce one uniform CSV file, get the db table in sync with its columns and bulk insert it 3. Bulk insert every file to its own table, and after that merge the tables into the target table.
Upvotes: 3
Reputation: 7451
I recommend writing a small c# application that reads each of the CSV file headers and stores a dictionary of the columns needed and either outputs a 'create table' statement or directly runs a create table operation on the database. Then you can use Sql Management Studio to load the 20 files individually using the import routine.
Upvotes: 1