RemarkLima
RemarkLima

Reputation: 12037

Entity Framework bulk inserts - creating a large number of objects / entities is the slow part?

There's a lot of information and answers on here and elsewhere about how to speed up bulk inserts with EF.

However, I'm only inserting about 10,000 records and the slow part of creating the entities.

First, I save the external data to a datatable, then loop through and for each row create a new entity, add the child entities in the loop (sourced from a couple of other datatables) and once the loop is finished call db.SaveChanges().

In my initial profiling, the db.SaveChanges() is slow, but nothing compared to the loop of creating all the objects in the first place - either as a separate List<entity> entities or directly to the context.

Is this usual? As of all the issues with bulk insert I can see, most seem to be related to the final commit to the database.


Edit to add some code Please excuse the psudo-code

DataTable ref1 = ConvertCSVtoDataTable(csv, firstRowsToDelete: 15); // Return's a Datatable from a CSV

foreach(string file in ListOfFilesToProcess)
{
    DataTable tbl = loadExcelFiles(file);

    foreach(DataRow dr in tbl.Rows)
    {
         Entity newEntity = new Entity();
         Entity.property1 = dr["Property1"].ToString();
         ... // Keep mapping properties to elements in the datatable
         Entity.Child.Add(new ChildEntity() { prop1 = ref1["ChildProp1"].ToString() });
         // Add the newly created entity to the context
         db.Entity.Add(newEntity);
    }
}
// Save the context
db.SaveChanges();

So by the point of saving the context there's a few thousand newEntity objects and their child navigation objects.

Iterating over the datatable and creating all these objects is the part which is the slowest!

Also, db.Configuration.AutoDetectChangesEnabled = false; has already been set.

Upvotes: 1

Views: 7814

Answers (2)

Jonathan Magnan
Jonathan Magnan

Reputation: 11337

The slow part in your “creating all the objects” section is the Detect Changes.

You should always use AddRange over Add

  • Add: Detect changes for every object
  • AddRange: Detect changes only once (after all object are added)

This code should fix the slow part when creating objects:

DataTable ref1 = ConvertCSVtoDataTable(csv, firstRowsToDelete: 15); // Return's a Datatable from a CSV

List<Entity> list = new List<Entity>();

foreach(string file in ListOfFilesToProcess)
{
    DataTable tbl = loadExcelFiles(file);

    foreach(DataRow dr in tbl.Rows)
    {
         Entity newEntity = new Entity();
         Entity.property1 = dr["Property1"].ToString();
         ... // Keep mapping properties to elements in the datatable
         Entity.Child.Add(new ChildEntity() { prop1 = ref1["ChildProp1"].ToString() });

        list.Add(newEntity);
    }
}

// Add all newly created entities to the context
db.Entity.AddRange(list);

// Save the context
db.SaveChanges();

If after this fix, you still have some issue performance (from database this time), you should try a BulkSaveChanges / Bulk Insert third party library.

Here is an article about these libraries: Entity Framework - Bulk Insert Library Reviews & Comparisons.

  • Entity Framework Extensions (Recommanded, support everything)
  • EntityFramework.BulkInsert
  • EntityFramework.Utilities

Disclaimer: I'm the owner of the project Entity Framework Extensions

Upvotes: 1

Jonathan Allen
Jonathan Allen

Reputation: 70307

Just use a bulk insert. Even if you get past EF's ridiculously bad performance, you still have to contend with the database not liking individual inserts.

Upvotes: 1

Related Questions