Martin Kirk
Martin Kirk

Reputation: 340

EF Core (.NET 5 + EFCore.BulkExtentions) : BulkInsert to SQL Server using the IncludeGraph not super fast

I've constructed a multithreaded program that reads an 80GB XML file and inserts entities into SQL Server using EF Core bulk extensions.

The XML is a list of graphs that i want to insert as fast as possible.

I need to insert the objects as a graph to maintain their relations where i afterwords intend to run a series of cleanups to remove duplicates.

My main problem right now is that the

BulkInsert(entities, EncludeGraph = true) 

isn't as fast as I want it to be.

The entities are created using model-first, where I've disabled the CascadeDelete option from the foreign key constraints. Each navigation property is automatically generated and creates a nonclustered index (which I suppose is OK).

Any advice on how to improve the insert speed?

Upvotes: 0

Views: 2784

Answers (1)

Martin Kirk
Martin Kirk

Reputation: 340

After reading the Sourcecode of EFCore.BulkInsert it's pretty obvious that the way that it inserts a whole graph of objects, is by first bulk inserting all the parents - then retrieve the ID's, update the 1st level childrens parentid then bulk insert them etc until the final leaf nodes.

This is suboptimal since it takes several roundtrips to complete a medium nested graph.

The only way i can i imagine making this faster, is by doing identi-insert by manually creating the Identity values outside of SQL and then traverse the graph and update all child elements with their parent id.

THEN bulk insert using identity-insert without any roundtrips.

UPDATE:

I made it all work quite nicely. Inserting about 30.000 rows pr second (50-60MB/s SQL write speed) and it finishes the 80GB XML file in about 45 minutes. Which is quite acceptable compared to the previous 12+ hour process.

I've made a series of threads that does single tasks:

Thread 1. Read from Zipped XML Archive into a ConcurrentQueue

Thread 2. Scan the Queue for Start-Stop elements. Merging them using StringBuilder into ConcurrentQueue

Thread 3. Remove namespace from XML-entity and add to ConcurrentQueue

Thread 4. Deserialize XML into Object and add to ConcurrentQueue

Thread 5. Create Lists of 1000 Objects and add to ConcurrentQueue<List>

Thread 6. Use GraphUtil.GetTopologicallySortedGraph from EFCore.BulkExtensions to split the object into List of Entity Types and insert to DB using BulkInsert ( KeepIdentity=true )

var grp = GraphUtil.GetTopologicallySortedGraph(db, list).GroupBy(g => g.Entity.GetType());

foreach (var typecollection in grp)
{
    var entityClrType = typecollection.Key;

    var sublist = typecollection.Select(s => s.Entity).ToList();

    ObjectCount += sublist.Count;

    db.BulkInsert(sublist, config, type: entityClrType);
}

Note: When Deserializing the XML's their ID's are created:

private static int _EntityID;

public Entity()
{
     EntityID= ++_EntityID;
}

Note: I'm using EFCore Codefirst mechanics, which automatically handles all navigation properties and their relation ID - very important ! - updating the child ID automatically ensures the parent relation.

Note: I'm running 1 Deserializer (can't run more than one because of static identity counters) and 6 BulkInserter threads

Upvotes: 1

Related Questions