Puneet Purohit
Puneet Purohit

Reputation: 1291

bulk insertion in MS SQL from a text file

I have a text file that contains around 21 lac entries and I want to insert all these entries into a table. Initially I have created one function in c# that read line by line and insert into table but it takes too much time. Please suggest an efficient way to insert these bulk data and that file is containing TAB(4 spaces) as delimiter.
And that text file also containing some duplicate entries and I don't want to insert those entries.

Upvotes: 0

Views: 2204

Answers (1)

Paul Aldred-Bann
Paul Aldred-Bann

Reputation: 6020

Load all of your data into a DataTable object and then use SqlBulkCopy to bulk insert them:

DataTable dtData = new DataTable("Data");

// load your data here

using (SqlConnection dbConn = new SqlConnection("db conn string"))
{
    dbConn.Open();

    using (SqlTransaction dbTrans = dbConn.BeginTransaction())
    {
        try
        {
            using (SqlBulkCopy dbBulkCopy = new SqlBulkCopy(dbConn, SqlBulkCopyOptions.Default, dbTrans))
            {
                dbBulkCopy.DestinationTableName = "intended SQL table name";
                dbBulkCopy.WriteToServer(dtData );
            }

            dbTrans.Commit();
        }
        catch
        {
            dbTrans.Rollback();
            throw;
        }
    }

    dbConn.Close();
}

I've included the example to wrap this into a SqlTransaction so there will be a full rollback if there's a failure along the way. To get you started, here's a good CodeProject article on loading the delimited data into a DataSet object.

Sanitizing the data before loading

OK, here's how I think your data looks:

CC_FIPS    FULL_NAME_ND 
AN         Xixerella
AN         Vila 
AN         Sornas
AN         Soldeu
AN         Sispony
... (cut down for brevity)

In this instance you want to create your DataTable like this:

DataTable dtData = new DataTable("Data");
dtData.Columns.Add("CC_FIPS");
dtData.Columns.Add("FULL_NAME_ND");

Then you want to iterate each row (assuming your tab delimited data is separated row-by-row by carriage returns) and check whether this data already exists in the DataTable using the .Select method and if there is a match (i'm checking for BOTH values, it's up to you whether you want to do something else) then don't add it thereby preventing duplicates.

using (FileStream fs = new FileStream("path to your file", FileMode.Open, FileAccess.Read))
{
    int rowIndex = 0;
    using (StreamReader sr = new StreamReader(fs))
    {
        string line = string.Empty;
        while (!sr.EndOfStream)
        {
            line = sr.ReadLine();

            // use a row index to skip the header row as you don't want to insert CC_FIPS and FULL_NAME_ND
            if (rowIndex > 0)
            {
                // split your data up into a 2-d array tab delimited
                string[] parts = line.Split('\t');

                // now check whether this data has already been added to the datatable
                DataRow[] rows = dtData.Select("CC_FIPS = '" + parts[0] + "' and FULL_NAME_ND = '" + parts[1] + "'");
                if (rows.Length == 0)
                {
                    // if there're no rows, then the data doesn't exist so add it
                    DataRow nr = dtData.NewRow();
                    nr["CC_FIPS"] = parts[0];
                    nr["FULL_NAME_ND"] = parts[1];
                    dtData.Rows.Add(nr);
                }
            }

            rowIndex++;
        }
    }
}

At the end of this you should have a sanitized DataTable that you can bulk insert. Please note that this code isn't tested, but it's a best guess as to how you should do it. There are many ways this can be done, and probably a lot better than this method (specifically LINQ) - but it's a starting point.

Upvotes: 4

Related Questions