Koji
Koji

Reputation: 589

EF Core deadlock with multi threads + BeginTransaction + Commit

I have some questions regarding how SaveChangesAsync() and BeginTransaction() + transaction.Commit() work.

My team has a .NET Core worker that receives events from Microsoft EventHub and saves data into SQL server via EF Core 3.
One of event types has a lot of data, so we created a few tables, separate data and then save it into these tables. The child tables reference the parent table's id column (FK_Key).
Some data in the DB has to be deleted before new data is saved under certain conditions, so we delete -> upsert data.

To save data into the DB, we call dbContext.Database.BeginTransaction() and transaction.Commit(). When we run the worker, we get deadlock exception like Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

I found that one of .BatchDeleteAsync() in PurgeDataInChildTables() or one of BulkInsertOrUpdateAsync() in Upsert() throws a deadlock exception (it changes every time I run the worker).

Here is the code:

public async Task DeleteAndUpsert(List<MyEntity> entitiesToDelete, List<MyEntity> entitiesToUpsert)
{
    if (entitiesToDelete.Any())
        await myRepository.Delete(entitiesToDelete);

    if (entitiesToUpsert.Any())
        await myRepository.Upsert(entitiesToUpsert);
}


public override async Task Upsert(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        using (var transaction = dbContext.Database.BeginTransaction())
        {
            await PurgeDataInChildTables(entities, dbContext);
            await dbContext.BulkInsertOrUpdateAsync(entities);
            // tables that depends on the parent table (FK_Key)
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child1>(x => x.Id).ToList());
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child2>(x => x.Id).ToList());
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child3>(x => x.Id).ToList());
            transaction.Commit();
        }
    }
}

public override async Task Delete(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        using (var transaction = dbContext.Database.BeginTransaction())
        {
            await PurgeDataInChildTables(entities, dbContext);
            await dbContext.BulkDeleteAsync(entities);
            transaction.Commit();
        }
    }
}

private async Task PurgeDataInChildTables(IList<MyEntity> entities, MyDbContext dbContext)
{
    var ids = entities.Select(x => x.Id).ToList();

    await dbContext.Child1.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
    await dbContext.Child2.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
    await dbContext.Child3.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
}

When the worker starts up, it creates four threads and they all upsert to the same table (and delete too). So, I assume that deadlock occurs when one thread starts a transaction and another starts another transaction (or something similar..) and then try to upsert to (or delete from) child tables.
I tried some stuff to solve the issue and noticed that deadlock seems to be resolved when I remove BeginTransaction() and use SaveChangesAsync() instead.

Here is the modified code:

public override async Task Upsert(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        await PurgeDataInChildTables(entities, dbContext);
        await dbContext.BulkInsertOrUpdateAsync(entities);
        // tables that depends on the parent table (FK_Key)
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child1).ToList());
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child2).ToList());
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child3).ToList());
        await dbContext.SaveChangesAsync();
    }
}

public override async Task Delete(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        await PurgeDataInChildTables(entities, dbContext);
        await dbContext.BulkDeleteAsync(entities);
        await dbContext.SaveChangesAsync();
    }
}

Deadlock was occuring ~ 30 seconds after the worker starts up but it didn't occur for 2 ~ 3 mins when I modified the code, so I think the issue is resolved, thought it might still occur if I run the worker longer.

Finally, here are my questions:

Upvotes: 4

Views: 10312

Answers (1)

cassandrad
cassandrad

Reputation: 3526

Hard to say precisely without looking into profiling session of the database. The thing that needs to be looked up there is what kind of locks are taken (where it is shared and where it is exclusive or update) and when transaction are actually opened. I will describe a theoretical behaviour that needs to be proved with actual database profiling.

When you wrap everything with Database.BeginTransaction():
Isolation level isn't set by EF, it uses database default isolation level. In case of Microsoft SQL Server it will be Read committed. This isolation level says that concurrent transactions can read data, but if there is ongoing modification, other transactions will wait for it to complete, even if they want just read. Transaction will be held before Commit() is called.

When you don't specify transaction explicitly:
Select statements and SaveChangesAsync will lead to separate transactions with the same isolation level defaulted to database. Transaction isn't held longer than it needs: in case of, for example, SaveChangesAsync, it will be there while all changes are written, starting from the moment when the method is called.

Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This message appears when there are several transactions trying to get access to some resource, and one of them is trying to read data and the other one is trying to modify. In that case, to avoid dead lock, database will try to kill a transaction that will require lower amount of resources to rollback. In your case — it's a transaction that tries to read. Reads are lightwight in terms or weight of rollback.

Summarizing:
When you have one huge lock that holds one resource a huge amount of time, it stops other workers from accessing that resource as database just kills other workers' transactions when they try to read probably at that var ids = entities.Select(x => x.Id).ToList(); point. When you rewrote your code, you got rid of long locks. More to that, as I can see from documentation to BulkInsertOrUpdateAsync, this extension uses internal transactions on each call, not affecting and not involving EF context. If that so, then it means that actual transactions live even less than one call to SaveChangesAsync when data changed not with the extension, but in usual EF way.

Upvotes: 2

Related Questions