Rob None
Rob None

Reputation: 551

Best way to insert milions of records

I'm working with an hosted service in C# asp.net core, linQ and T-SQL.

I need to make an insert one by one of records in my database. Of course this is not a fast operation, but I'm not that experienced in this field so maybe I'm doing it wrong. This is my code in my manager:

    public void StrategyMassive(string foldpathsave)
    {
        using (IServiceScope scope = _services.CreateScope())
        {
            List<string> filesreading = new List<string>();

            VUContext _context = scope.ServiceProvider.GetRequiredService<VUContext>();

            List<string> filesnumber = File.ReadAllLines(foldpathsave).ToList();

            filesreading = filesnumber.ToList();

            filesreading.RemoveRange(0, 2);

            foreach (string singlefile in filesreading)
            {

                //INTERNAL DATA NORMALIZATION

                _repository.ImportAdd(_context, newVUL, newC2, newC3, newDATE);

                _repository.Save(_context);

            }
        }
    }

And this is my repository interface:

    public void ImportAdd(VUContext _context, AVuTable newVUL, ACs2Table newC2, ACs3Table newC3, ADateTable newDATe)
    {
        _context.AVuTable.Add(newVU);

        _context.ADateTable.Add(newDATE);

        if (newC2 != null)
        {
            _context.ACs2Table.Add(newC2);
        }

        if (newC3 != null)
        {
            _context.ACs3Table.Add(newC3);
        }

        public void Save(VUContext _context)

        {
            _context.SaveChanges();
        }
    }

It everything quite simple I know, so how can I speed up this insert keeping it one by one record easly?

Upvotes: 0

Views: 1496

Answers (3)

Cetin Basoz
Cetin Basoz

Reputation: 23797

To my experience SqlBulkCopy is the fastest way to do it. filesnumber sounds to be misnomer and I suspect you are reading a list of delimited files to be loaded to SQL Server after some normalization process. Probably that would even be faster if you do your normalization on server side, after loading the data initially to a temp file. Here is a sample SqlBulkCopy from a delimited file:

void Main()
{
  Stopwatch sw = new Stopwatch();
  sw.Start();
  string sqlConnectionString = @"server=.\SQLExpress2012;Trusted_Connection=yes;Database=SampleDb";

  string path = @"d:\temp\SampleTextFiles";
  string fileName = @"combDoubledX.csv";

  using (OleDbConnection cn = new OleDbConnection(
    "Provider=Microsoft.ACE.OLEDB.12.0;Data Source="+path+
    ";Extended Properties=\"text;HDR=No;FMT=Delimited\";"))

  using (SqlConnection scn = new SqlConnection( sqlConnectionString ))
  {
  OleDbCommand cmd = new OleDbCommand("select * from "+fileName, cn);

  SqlBulkCopy sbc = new SqlBulkCopy(scn, SqlBulkCopyOptions.TableLock,null);

  sbc.ColumnMappings.Add(0,"[Category]");
  sbc.ColumnMappings.Add(1,"[Activity]");
  sbc.ColumnMappings.Add(5,"[PersonId]");
  sbc.ColumnMappings.Add(6,"[FirstName]");
  sbc.ColumnMappings.Add(7,"[MidName]");
  sbc.ColumnMappings.Add(8,"[LastName]");
  sbc.ColumnMappings.Add(12,"[Email]");

  cn.Open();
  scn.Open();

  SqlCommand createTemp = new SqlCommand();
  createTemp.CommandText = @"if exists
   (SELECT * FROM tempdb.sys.objects 
   WHERE object_id = OBJECT_ID(N'[tempdb]..[##PersonData]','U'))
   BEGIN
        drop table [##PersonData];
   END

  create table ##PersonData 
  (
    [Id] int identity primary key,
    [Category] varchar(50), 
    [Activity] varchar(50) default 'NullOlmasin', 
    [PersonId] varchar(50), 
    [FirstName] varchar(50), 
    [MidName] varchar(50), 
    [LastName] varchar(50), 
    [Email] varchar(50)
  )
"; 
  createTemp.Connection = scn;
  createTemp.ExecuteNonQuery();

  OleDbDataReader rdr = cmd.ExecuteReader();

  sbc.NotifyAfter = 200000;
  //sbc.BatchSize = 1000;
  sbc.BulkCopyTimeout = 10000;
  sbc.DestinationTableName = "##PersonData";
  //sbc.EnableStreaming = true;

  sbc.SqlRowsCopied += (sender,e) =>
    {
    Console.WriteLine("-- Copied {0} rows to {1}.[{2} milliseconds]", 
      e.RowsCopied, 
      ((SqlBulkCopy)sender).DestinationTableName,
      sw.ElapsedMilliseconds);
    };

  sbc.WriteToServer(rdr);

  if (!rdr.IsClosed) { rdr.Close(); }

  cn.Close();
  scn.Close();
  }
  sw.Stop();
  sw.Dump();
}

And few sample lines from that file:

"Computer Labs","","LRC 302 Open Lab","","","10057380","Test","","Cetin","","5550123456","","[email protected]"
"Computer Labs","","LRC 302 Open Lab","","","123456789","John","","Doe","","5551234567","","[email protected]"
"Computer Labs","","LRC 302 Open Lab","","","012345678","Mary","","Doe","","5556666444","","[email protected]"

You could create and run a list of Tasks<> doing SqlBulkCopy reading from a source (SqlBulkCopy supports a series of readers).

Upvotes: 2

George Polevoy
George Polevoy

Reputation: 7671

For faster operation you need to reduce the amount of database roundtrips

Using batching of statements feature in EF Core

You can see this feature is available only in EF Core, so you need to migrate to using EF Core if you are still using EF 6.

Compare EF Core & EF6

For this feature to work you need to move the Save operation outside of the loop.

Bulk insert

Bulk insert feature is designed to be the fastest way to insert large amount of database records

Bulk Copy Operations in SQL Server

To use it you need to use the SqlBulkCopy class for SQL Server and your code needs considerable rework.

Upvotes: 0

TomTom
TomTom

Reputation: 62093

Start NOT using the slowest way to do it.

It starts with the way you actually load the files. It goes on by not using SqlBulkCopy - in multiple threads possibly - to write the data to the database.

What you do is the slowest possible way - because EntityFramework is NOT an ETL tool.

Btw., one transaction per item (SaveChanges) does not help either. It maeks a super slow solution really really really super slow.

I manage to laod around 64k rows per second per thread, with 4-6 threads running in parallel.

Upvotes: 2

Related Questions