Mike Christensen
Mike Christensen

Reputation: 91716

Inserting many rows with Entity Framework is extremely slow

I'm using Entity Framework to build a database. There's two models; Workers and Skills. Each Worker has zero or more Skills. I initially read this data into memory from a CSV file somewhere, and store it in a dictionary called allWorkers. Next, I write the data to the database as such:

// Populate database
using (var db = new SolverDbContext())
{
   // Add all distinct skills to database
   db.Skills.AddRange(allSkills
      .Distinct(StringComparer.InvariantCultureIgnoreCase)
      .Select(s => new Skill
      {
         Reference = s
      }));

   db.SaveChanges(); // Very quick
   var dbSkills = db.Skills.ToDictionary(k => k.Reference, v => v);

   // Add all workers to database
   var workforce = allWorkers.Values
      .Select(i => new Worker
      {
         Reference = i.EMPLOYEE_REF,
         Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
         DefaultRegion = "wa",
         DefaultEfficiency = i.TECH_EFFICIENCY
      });

   db.Workers.AddRange(workforce);
   db.SaveChanges(); // This call takes 00:05:00.0482197
}

The last db.SaveChanges(); takes over five minutes to execute, which I feel is far too long. I ran SQL Server Profiler as the call is executing, and basically what I found was thousands of calls to:

INSERT [dbo].[SkillWorkers]([Skill_SkillId], [Worker_WorkerId])
VALUES (@0, @1)

There are 16,027 rows being added to SkillWorkers, which is a fair amount of data but not huge by any means. Is there any way to optimize this code so it doesn't take 5min to run?

Update: I've looked at other possible duplicates, such as this one, but I don't think they apply. First, I'm not bulk adding anything in a loop. I'm doing a single call to db.SaveChanges(); after every row has been added to db.Workers. This should be the fastest way to bulk insert. Second, I've set db.Configuration.AutoDetectChangesEnabled to false. The SaveChanges() call now takes 00:05:11.2273888 (In other words, about the same). I don't think this really matters since every row is new, thus there are no changes to detect.

I think what I'm looking for is a way to issue a single UPDATE statement containing all 16,000 skills.

Upvotes: 0

Views: 1759

Answers (1)

JD Davis
JD Davis

Reputation: 3720

One easy method is by using the EntityFramework.BulkInsert extension.

You can then do:

// Add all workers to database
var workforce = allWorkers.Values
   .Select(i => new Worker
   {
      Reference = i.EMPLOYEE_REF,
      Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
      DefaultRegion = "wa",
      DefaultEfficiency = i.TECH_EFFICIENCY
   });

db.BulkInsert(workforce);

Upvotes: 1

Related Questions