Why there is significant performance gain when doing multiple insert over single insert in a command

Question

I want to insert around 3000 records, when I go by approach 1 it takes around 2 min to complete, however if i use approach 2 insert completes in less than second. Though approach 2 doesn't adhere to good practice but its giving me good performance gain. Would like to understand why approach 1 takes so much time and can there be a better way to do this

Approach 1:

public static void InsertModelValue(DataSet employeData, int clsaId)
{
    var query = @"INSERT INTO employee (id, name)
                  VALUES (@id, @name)";
    using (var connection = GetOdbcConnection())
    {                      
        connection.Open();                
        var tran = connection.BeginTransaction();
        try
        {                   

            foreach (DataRow row in employeData.Tables[0].Rows)
            {                       
                using (var cmd = new OdbcCommand(query, connection, tran))
                {
                    cmd.Parameters.Add("@id", OdbcType.VarChar).Value = row["ID"];
                    cmd.Parameters.Add("@name", OdbcType.Int).Value = Convert.ToInt32(row["Name"]);
                    cmd.ExecuteNonQuery();
                }
             }
            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }                      
   }          
}

Approach 2:

public static void InsertModelValueInBulk(DataSet employeData, int clsaId, int batchSize)
{          
    string[] insertStatement = new string[batchSize];
    using (var connection = GetOdbcConnection())
    {
        connection.Open();
        var tran = connection.BeginTransaction();
        try
        {                               
            int j = 0;
            for (int i = 0; i < employeData.Tables[0].Rows.Count; i++)
            {
                var row = employeData.Tables[0].Rows[i];      
                var insertItem = string.Format(@"select '{0}',{1}", row["name"], Convert.ToInt32(row["ID"]);
                insertStatement[j] = insertItem;
                if (j % (batchSize-1) == 0 && j > 0)
                {
                    var finalQuery = @" INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement);
                    using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                    {
                        cmd.ExecuteNonQuery();
                    }
                    j = 0;
                    continue;
                }
                else
                {
                    j = j + 1;
                }
            }

            if (j > 0)
            {

                var finalQuery = @"INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement,0,j-1);
                using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                {
                    cmd.ExecuteNonQuery();
                }
            }

            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }
    }
}

Eric Lippert · Accepted Answer

You want to deposit three thousand dollars in your bank account. Which is faster:

wait for a teller
take a dollar out of your wallet
show your id to the teller
deposit the dollar
go to the end of the line
repeat the whole process 2999 more times, then go home.

or

wait for a teller
take three thousand dollars out of your wallet
show your id to the teller
deposit the three thousand dollars
go home

?

It should be fairly obvious that the first one is a lot slower than the second one. Now is it clear why the first technique is hundreds of times slower than the second?

Why there is significant performance gain when doing multiple insert over single insert in a command

Answers (1)

Related Questions