Reputation: 2617
We have a use case in our application where a user triggers a request that will cause an insert of 100 to 1000 rows.
After that insert, we need the object to continue processing and create more objects which are foreign keys to the originally inserted objects, or in other words, we need the Primary Key ID of the inserted objects.
So far we have used EF to do that in a foreach loop, this was too slow, and was taking about 15-20 seconds to complete for about 600 rows. (while blocking the user, bad :( )
Original code (also handles updates, but we do not care about performance there, it is not blocking users):
foreach (Location updatedLoc in locationsLoaded)
{
// find it in the collection from the database
Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));
// update or insert
if (fromDb != null)
{
// link ids for update
updatedLoc.Id = fromDb.Id;
// set values for update
db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
}
else
{
System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");
// insert a new location <============ This is the bottleneck, takes about 20-40ms per row
db.Locations.Add(updatedLoc);
}
}
// This actually takes about 3 seconds for 600 rows, was actually acceptable
db.SaveChanges();
So after researching on SO and internet, I found out I was using EF the wrong way and need to use SqlBulkCopy
And thus the code was rewritten, and what used to take ~20 seconds, now takes ~100ms (!)
foreach (Location updatedLoc in locationsLoaded)
{
// find it in the collection from the database
Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));
// update or insert
if (fromDb != null)
{
// link ids for update
updatedLoc.Id = fromDb.Id;
// set values for update
db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
}
else
{
System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");
// insert a new location
dataTable.Rows.Add(new object[] { \\the 14 fields of the location.. });
}
}
System.Diagnostics.Trace.WriteLine("preparing to bulk insert", "loadSimple");
// perform the bulk insert
using (var bulkCopy = new System.Data.SqlClient.SqlBulkCopy(System.Configuration.ConfigurationManager.ConnectionStrings["bulk-inserter"].ConnectionString))
{
bulkCopy.DestinationTableName = "Locations";
for (int i = 0; i < dataTable.Columns.Count; i++)
{
bulkCopy.ColumnMappings.Add(i, i + 1);
}
bulkCopy.WriteToServer(dataTable);
}
// for update
db.SaveChanges();
The problem is, after the bulk copy, the objects in the Locations
collection, which is part of the EF ORM are not changed (that is okay and expected), but I need the inserted ids to continue work on these objects.
A simple solution would be to immediately select the data again from the database, I have a the data at hand, I can simply re-select it into a different collection.
But that solution feels incorrect, is there no way to get the ids as part of the insert.
EDIT: the simple solution works, see accepted answer below on how to easily sync it back to EF.
Maybe I should not use SqlBulkCopy (i expect up to about 1000 rows, no more) and use something else?
Please note, A few related SO questions and solutions, all seem to move away from EF..
Upvotes: 4
Views: 2672
Reputation: 14133
Set:
yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;
Do SaveChanges() in packages of 100 inserts... try with 1000 and see the changes.
Since during all this inserts, the context is the same, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext();
Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.
The actual numbers... could not be 100's o 1000's in your case... try it and tweek it.
Upvotes: 0
Reputation: 69769
If you are using SQL-Server 2008 or later you could use a stored procedure to do what you are after. You would need to define a TYPE
that is the same as your data table in SQL:
CREATE TYPE dbo.YourType AS TABLE (ID INT, Column1 INT, Column2 VARCHAR(5)...)
Then pass this type to a stored procedure.
CREATE PROCEDURE dbo.InsertYourType (@YourType dbo.YourType READONLY)
AS
BEGIN
DECLARE @ID TABLE (ID INT NOT NULL PRIMARY KEY)
INSERT INTO YourTable (Column1, Column2...)
OUTPUT inserted.ID INTO @ID
SELECT Column1, Column2...
FROM @YourType
SELECT *
FROM YourTable
WHERE ID IN (SELECT ID FROM @ID)
END
This will capture the ID for the inserted rows, and return all new rows. As long as your c# datatable conforms to the format of dbo.YourType you can pass this in the same way you would normally pass a parameter to an SqlCommand.
SqlCommand.Parameters.Add("@YourType", YourDataTable)
I realise this is similar to your proposal of reselecting the data, but the select should be fast as it is only using the identity column. While you still have the problem of using an SQL insert rather than the bulk copy, you are reverting to a more set based solution rather than the procedural based solution of EF. This is quite similar to the accepted answer in one of the links you have posted, but I've removed a couple of stages with the use of a table variable.
Upvotes: 4
Reputation: 126547
Nothing you do through the EF will ever be as fast as SqlBulkCopy. Indeed, raw SQL INSERT
s aren't as fast. So you just need to re-read Locations. Refresh the query by re-reading with MergeOption.OverwriteChanges.
Upvotes: 4