syscloud
syscloud

Reputation: 71

C# Entity Framework - Update column value in 500,000+ records

We need to process 500,000 records in a database by adding a certain value for a specific column in each record.

Currently, we are running multiple Tasks in parallel using TPL, each taking the records in batches (of size 1000) update the values, and writing them back to the database using a DBContext. This takes around 10 minutes to process.

Are there more efficient ways to process large databases?

EDIT - the value that we update with is generate dynamically, depending on the record information

Upvotes: 1

Views: 208

Answers (2)

David Anderson
David Anderson

Reputation: 13670

If you are unable to use T-SQL directly then change your approach to produce the T-SQL needed to run it directly. If the values must be calculated beforehand and are different for each record this will be a much faster approach than trying to use Entity Framework on such a large dataset.

Architecture and design of your codebase plays a key role here. These types of problems are why we cleanly separate the domain logic from data access logic, so the processing and computation of business rules and values does not interfere with how you need to persist them and visa-versa.

For example, if you had 500,000 business entity classes that you retrieve from a repository class and compute all the values for them, you could simply then enumerate all of them and produce the desired SQL or pass the new values and identities to your data access layer to perform an optimized bulk update.

The reason I did not provide code in this answer is because there are many ways to develop a solution to this problem using my suggested approach.

It is important to still understand that Entity Framework was still designed around the unit of work concept (at least EF6) and is not optimized for bulk workloads (except select query scenarios). A well-designed codebase will definitely have a mix of EF and T-SQL in the data access layer (or database via functions and stored procedures) to handle performance critical operations.

Upvotes: 0

D Stanley
D Stanley

Reputation: 152566

Are there more efficient ways to process large databases?

Run a SQL statement to change all of the data at once. Don't feel like you have to use entities for every DB update - there's still nothing wrong with running SQL scripts on the back-end database directly. There are methods within EF to run custom SQL, or you could have a separate "support" app that does not use EF but manages the data directly.

Upvotes: 6

Related Questions