Reputation: 1680
I have the scenario where I need to perform DB updates for a large amount of data. There are foreign key relationships that need to be added at the same time and I am getting a list of the foreign objects so I don't have to hit the database each time to check if they exist/add them etc:
using(DbEntities db = new DbEntities())
{
// Get list of all books so don't have to hit every time
Dictionary<int, Book> books = db.Books.ToDictionary(k => k.BookId, v => v);
// Loop through big file to import each row
foreach(var item in bigFile)
{
Person person = new Person { FirstName = item.FirstName, LastName = item.LastName };
foreach(var book in item.Books)
{
if(!books.ContainsKey(book.BookId))
{
// Add book to DB if doesn't exist
Book bookToAdd = new Book { BookId = book.BookId, Name = book.Name };
db.Books.Add(bookToAdd);
books.Add(bookToAdd.BookId, bookToAdd);
}
person.Books.Add(books[book.BookId]);
}
db.People.Add(person);
}
db.SaveChanges();
}
The problem with this solution is that the import starts fast and slows down as it goes and becomes really slow. This seems to be down to the context getting bloated with change tracking.
I have seen posts suggesting using db.Configuration.AutoDetectChangesEnabled = false
but when I do that the relationships do not get added. I can make that work by forcing DetectChanges()
but that seems to defeat the purpose as I will have to do it every iteration of the loop.
So, I moved the DB context inside the loop so it is recreated each time. Doing this means I can no longer have the detached list of books so I have to do an .Any()
and .Single()
call to the DB for each row (I don't know if that is a major performance issue but always try to hit the DB as infrequently as possible):
// Loop through big file to import each row
foreach(var item in bigFile)
{
// Create DB context each time
using(DbEntities db = new DbEntities())
{
Person person = new Person { FirstName = item.FirstName, LastName = item.LastName };
foreach(var book in item.Books)
{
if(!db.Books.Any(m => m.BookId = bookId))
{
// Add book to DB if doesn't exist
Book bookToAdd = new Book { BookId = bookId, Name = book.Name
db.Books.Add(bookToAdd);
}
person.Books.Add(db.Books.Single(m => m.BookId = bookId));
}
db.People.Add(person);
db.SaveChanges();
}
}
This speeds it up considerably but it still starts slows down after around 5,000-10,000 rows and I was wondering what my options are? ...other than just doing it all with stored procedures!
Upvotes: 2
Views: 2123
Reputation: 205589
IMO both solutions are not good. The first is loading the whole existing Books
table in memory (and db context) and the second performs 2 db queries per person book - one with Any
and one with Single
.
Since my tests does not show performance issues with context change tracking, I would use a variation of the first approach with elements from second. Instead of loading the whole Books
table, I would use a local dictionary populated on demand, with single db query for each new book Id:
using (DbEntities db = new DbEntities())
{
// The local book dictionary
Dictionary<int, Book> books = new Dictionary<int, Book>();
// Loop through big file to import each row
foreach (var item in bigFile)
{
Person person = new Person { FirstName = item.FirstName, LastName = item.LastName };
foreach (var itemBook in item.Books)
{
Book book;
// Try get from local dictionary
if (!books.TryGetValue(itemBook.BookId, out book))
{
// Try get from db
book = db.Books.FirstOrDefault(e => e.BookId == itemBook.BookId);
if (book == null)
{
// Add book to DB if doesn't exist
book = new Book { BookId = itemBook.BookId, Name = itemBook.Name };
db.Books.Add(book);
}
// add to local dictionary
books.Add(book.BookId, book);
}
person.Books.Add(book);
}
db.People.Add(person);
}
db.SaveChanges();
}
Upvotes: 1
Reputation: 11337
db.Books.Any
I don't know if that is a major performance issue but always try to hit the DB as infrequently as possible
Yes, this is a major performance issue. For every book, you make a database round-trip which is quite inefficient.
Proposed Solution
(Yes, this is the same solution as your first example)
Make one database round-trip and use a dictionary instead.
// var bookIds = ctx.EntitySimples.Select(x => x.Id).ToDictionary(x => x);
var books = db.Books.ToDictionary(k => k.BookId, v => v);
// ...code...
if(!bookIds.ContainsKey(bookId))
{
}
Add + AutoDectectChangesEnabled = false vs AddRange
I have seen posts suggesting using db.Configuration.AutoDetectChangesEnabled = false but when I do that the relationships do not get added.
Disabling AutoDetectChanges allow to get the around the same performance between Add and AddRange. However, if that doesn't work, that for sure become an issue!
Proposed Solution
Use AddRange instead
using(DbEntities db = new DbEntities())
{
var listToAdd = new List<Book>();
var personToAdd = new List<Person>();
// Get list of all books so don't have to hit every time
Dictionary<int, Book> books = db.Books.ToDictionary(k => k.BookId, v => v);
// Loop through big file to import each row
foreach(var item in bigFile)
{
Person person = new Person { FirstName = item.FirstName, LastName = item.LastName };
foreach(var book in item.Books)
{
if(!books.ContainsKey(book.BookId))
{
// Add book to DB if doesn't exist
Book bookToAdd = new Book { BookId = book.BookId, Name = book.Name };
// ADD to list instead
listToAdd.Add(bookToAdd);
}
person.Books.Add(books[book.BookId]);
}
// ADD to list instead
personToAdd.Add(person);
}
// USE AddRange here instead
db.Books.AddRange(listToAdd);
db.People.AddRange(person);
db.SaveChanges();
}
SaveChanges
For every book and for every person, you need to add or update, a database round-trip is performed.
So if you need to insert 10000 books, 10000 database round-trip will be performed wich is INSANELY SLOW.
Proposed Solution
Use a library that allows you to perform bulk operations.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library allows you to perform all bulk operations:
Example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});
EDIT: Answer sub-question
I see the benefit of AddRange but what if I was to extend that code example to not only add new Person entities but also update if they already exist?
You can use the same solution as you currently do with books if everything can be loading in memory.
var people = db.People.ToDictionary(k => k.PersonId, v => v);
And simply edit it.
Keep in mind that if you don't have any concurrency check, you may override modified value if the importation takes a lot of time.
Be careful of common Pitfall:
Another technique could be using the BulkMerge method from my library.
Upvotes: 0
Reputation: 166
It sounds like you have memory leak, I've previously used PerfView to compare what objects are in memory at different times. At a guess I'd say that your context classes aren't being disposed of (i.e. they are being retained for some reason). If you have some experience with using performance tools like these then that'd be a good place to start, however there is a steep learning curve if you haven't.
Personally, I would use a single stored procedure and one or more table value parameters to do large data imports like yours. In my experience they are considerably much faster.
*edit
Just noticed some mistakes in your code... you're missing some comparison operators:
// Loop through big file to import each row
foreach(var item in bigFile)
{
// Create DB context each time
using(DbEntities db = new DbEntities())
{
Person person = new Person { FirstName = item.FirstName, LastName = item.LastName };
foreach(var book in item.Books)
{
if(!db.Books.Any(m => m.BookId == bookId))
{
// Add book to DB if doesn't exist
Book bookToAdd = new Book { BookId = bookId, Name = book.Name
db.Books.Add(bookToAdd);
}
person.Books.Add(db.Books.Single(m => m.BookId == bookId));
}
db.People.Add(person);
db.SaveChanges();
}
}
Upvotes: 0