Reputation: 1291
I have a text file that contains around 21 lac entries and I want to insert all these entries into a table. Initially I have created one function in c# that read line by line and insert into table but it takes too much time. Please suggest an efficient way to insert these bulk data and that file is containing TAB(4 spaces) as delimiter.
And that text file also containing some duplicate entries and I don't want to insert those entries.
Upvotes: 0
Views: 2204
Reputation: 6020
Load all of your data into a DataTable
object and then use SqlBulkCopy
to bulk insert them:
DataTable dtData = new DataTable("Data");
// load your data here
using (SqlConnection dbConn = new SqlConnection("db conn string"))
{
dbConn.Open();
using (SqlTransaction dbTrans = dbConn.BeginTransaction())
{
try
{
using (SqlBulkCopy dbBulkCopy = new SqlBulkCopy(dbConn, SqlBulkCopyOptions.Default, dbTrans))
{
dbBulkCopy.DestinationTableName = "intended SQL table name";
dbBulkCopy.WriteToServer(dtData );
}
dbTrans.Commit();
}
catch
{
dbTrans.Rollback();
throw;
}
}
dbConn.Close();
}
I've included the example to wrap this into a SqlTransaction
so there will be a full rollback if there's a failure along the way. To get you started, here's a good CodeProject article on loading the delimited data into a DataSet
object.
Sanitizing the data before loading
OK, here's how I think your data looks:
CC_FIPS FULL_NAME_ND
AN Xixerella
AN Vila
AN Sornas
AN Soldeu
AN Sispony
... (cut down for brevity)
In this instance you want to create your DataTable
like this:
DataTable dtData = new DataTable("Data");
dtData.Columns.Add("CC_FIPS");
dtData.Columns.Add("FULL_NAME_ND");
Then you want to iterate each row (assuming your tab delimited data is separated row-by-row by carriage returns) and check whether this data already exists in the DataTable
using the .Select
method and if there is a match (i'm checking for BOTH values, it's up to you whether you want to do something else) then don't add it thereby preventing duplicates.
using (FileStream fs = new FileStream("path to your file", FileMode.Open, FileAccess.Read))
{
int rowIndex = 0;
using (StreamReader sr = new StreamReader(fs))
{
string line = string.Empty;
while (!sr.EndOfStream)
{
line = sr.ReadLine();
// use a row index to skip the header row as you don't want to insert CC_FIPS and FULL_NAME_ND
if (rowIndex > 0)
{
// split your data up into a 2-d array tab delimited
string[] parts = line.Split('\t');
// now check whether this data has already been added to the datatable
DataRow[] rows = dtData.Select("CC_FIPS = '" + parts[0] + "' and FULL_NAME_ND = '" + parts[1] + "'");
if (rows.Length == 0)
{
// if there're no rows, then the data doesn't exist so add it
DataRow nr = dtData.NewRow();
nr["CC_FIPS"] = parts[0];
nr["FULL_NAME_ND"] = parts[1];
dtData.Rows.Add(nr);
}
}
rowIndex++;
}
}
}
At the end of this you should have a sanitized DataTable
that you can bulk insert. Please note that this code isn't tested, but it's a best guess as to how you should do it. There are many ways this can be done, and probably a lot better than this method (specifically LINQ) - but it's a starting point.
Upvotes: 4