Reputation: 708
.NET 5.6.1 Tasks Parallel Library - based application stopped responding in NpSql.dll 3.1.6 call to NpgsqlConnection.Open().
After further investigation it appeared that 100's of threads servicing particular connection string were waiting on Monitor.Enter signal. The waiting threads' callstacks were identical to this:
[GCFrame: 000000cd8922cec8]
[GCFrame: 000000cd8922d0f0]
[HelperMethodFrame: 000000cd8922d128] System.Threading.Monitor.Enter(System.Object)
Npgsql.ConnectorPool.Allocate(Npgsql.NpgsqlConnection, Npgsql.NpgsqlTimeout)
Npgsql.NpgsqlConnection.OpenInternal()
The locks dump pointed to the orphaned lock issue: MonitorHeld value equaled to 1 owner + 546 waiters (1+546*2=1093), with the lock owner thread dead (Thread is 0)
Index SyncBlock MonitorHeld Owning Thread Info Owner
220427 000000d26dd7b028 1093 1 0 XXX Npgsql.ConnectorPool
There were no exceptions generated from inside the unsafe block of code in ConnectorPool.Allocate() that would help explaining the orphaned lock.
There was was no reason to believe our code caused the thread die prematurely either: nowhere in the application did we explicitly call Thread.Abort().
At this point we ran out of ideas. Thanks for looking into this.
Upvotes: 2
Views: 373
Reputation: 708
Update:
The bug report was submitted to npsql by a member of our team.
After more thorough search through our logs we found single instance of NullReferenceException coming from within ConnectorPool.Allocate() in NpSql dll version 3.1.6 due to race condition. Looking at ConnectorPool.cs in current NpSql repo this particular issue had been fixed by converting static _pruningTimer to the instance-based field since version 3.1.7.
Conclusion: An upgrade from NpSql 3.1.6 to 3.1.7 resolved the deadlock in ConnectorPool.Allocate().
Upvotes: 2