Reputation: 8109

Is there any way to work around OS loader lock deadlocks caused by third-party libraries?

I have an interesting problem that I haven't seen documented anywhere else (at least not this specific issue).

This issue is a combination of COM, VB6, and .NET and making them play nice.

Here's what I have:

A legacy VB6 ActiveX DLL (written by us)
A multi-threaded Windows service written in C# that processes requests from clients over the network and sends back results. It does this by creating a new STA thread to handle each request. Each request-handler thread instantiates a COM object (defined in the ActiveX DLL) to process the request and get the result (a string of XML is passed in, and it returns a string of XML back), explicitly releases the COM object, and exits. The service then sends the result back to the client.
All of the network code is handled using asynchronous networking (i.e. thread pool threads).

And yes, I know this is already a risky thing to be doing in the first place, since VB6 isn't very friendly with multi-threaded applications to begin with, but unfortunately it's what I am stuck with for the moment.

I've already fixed a number of things that were causing deadlocks in the code (for example, making sure the COM objects are actually created and called from a separate STA thread, making sure to explicitly release the COM objects before the thread exits to prevent deadlocks that were occurring between the garbage collector and the COM Interop code, etc.), but there is one deadlock scenario that I just can't seem to solve.

With some help from WinDbg, I was able to figure out what is happening, but I'm not sure how (or if) there is a way around this particular deadlock.

What's happening

If one request-handler thread is exiting, and another request-handler thread is starting at the same time, a deadlock can occur because of the way the VB6 runtime initialization and termination routines seem to work.

The deadlock occurs in the following scenario:

The new thread that is starting up is in the middle of creating a new instance of the (VB6) COM object to process an incoming request. At this point, the COM runtime is in the middle of a call to retrieve the object's class factory. The class factory implementation is in the VB6 runtime itself (MSVBVM60.dll). That is, its calling the VB6 runtime's DllGetClassObject function. This, in turn, calls an internal runtime function (MSVBVM60!CThreadPool::InitRuntime), which acquires a mutex and enters a critical section to do part of its work. At this point, it's about to call LoadLibrary to load oleaut32.dll into the process, while holding this mutex. So, now it's holding this internal VB6 runtime mutex and waiting for the OS loader lock.
The thread that is exiting is already running inside the loader lock, because it's done executing managed code and is executing inside the KERNEL32!ExitThread function. Specifically, it's in the middle of handling the DLL_THREAD_DETECH message for MSVBVM60.dll on that thread, which in turn calls a method to terminate the VB6 runtime on the thread (MSVBVM60!CThreadPool::TerminateRuntime). Now, this thread tries to acquire the same mutex that the other thread being initialized already has.

A classic deadlock. Thread A has L1 and wants L2, but Thread B has L2 and needs L1.

The problem (if you've followed me this far) is I don't have any control over what the VB6 runtime is doing in its internal thread initialization and teardown routines.

In theory, if I could force the VB6 runtime initialization code to run inside the OS loader lock, I would prevent the deadlock, because I am fairly certain the mutex the VB6 runtime is holding is specifically only used inside the initialization and termination routines.

Requirements

I can't make the COM calls from a single STA thread, because then the service won't be able to handle concurrent requests. I can't have a long-running request block other client requests either. This is why I create one STA thread per-request.
I need to create a new instance of the COM object on each thread, because I need to make sure each instance has its own copy of global variables in the VB6 code (VB6 gives each thread its own copy of all global variables).

Solutions I've tried that didn't work

Converted ActiveX DLL to ActiveX EXE

First, I tried the obvious solution and created an ActiveX EXE (out-of-process server) to handle the COM calls. Initially, I compiled it so that a new ActiveX EXE (process) was created for each incoming request, and I also tried it with the Thread Per Object compile option (one process instance is created, and it creates each object on a new thread within the ActiveX EXE).

This fixes the deadlock issue with respect to the VB6 runtime, because the VB6 runtime never gets loaded into the .NET code proper. However, this led to a different problem: if concurrent requests come into the service, the ActiveX EXE tends to fail randomly with RPC_E_SERVERFAULT errors. I assume this is because the COM marshalling and/or the VB6 runtime can't deal with concurrent object creation/destruction, or concurrent method calls, inside the ActiveX EXE.

Force the VB6 code to run inside the OS loader lock

Next, I switched back to using an ActiveX DLL for the COM class. To force the VB6 runtime to run its thread initialization code inside the OS loader lock, I created a native (Win32) C++ DLL, with code to handle DLL_THREAD_ATTACH in DllMain. The DLL_THREAD_ATTACH code calls CoInitialize and then instantiates a dummy VB6 class to force the VB6 runtime to be loaded and force the runtime initialization routine to run on the thread.

When the Windows service starts, I use LoadLibrary to load this C++ DLL into memory, so that any threads created by the service will execute that DLL's DLL_THREAD_ATTACH code.

The problem is that this code runs for every thread the service creates, including the .NET garbage collector thread and the thread-pool threads used by the async networking code, which doesn't end well (this just seems to cause the threads to never start properly, and I imagine initializing COM on the GC and thread-pool threads is in general just a very bad idea).

Addendum

I just realized why this is a bad idea (and probably part of the reason it didn't work): it isn't safe to call LoadLibrary when you are holding the loader lock. See Remarks section in this MSDN article: http://msdn.microsoft.com/en-us/library/ms682583%28VS.85%29.aspx, specifically:

Threads in DllMain hold the loader lock so no additional DLLs can be dynamically loaded or initialized.

Is there any way to workaround these issues?

So, my question is, is there any way to work around the original deadlock issue?

The only other thing I can think of is to create my own lock object and surround the code that instantiates the COM object in a .NET lock block, but then I have no way (that I know of) to put the same lock around the (operating system's) thread exit code.

Is there a more obvious solution to this issue, or am I plain out of luck here?

Upvotes: 4

Answers (5)

david

Reputation: 21

I had written a rather complex code using VB6,VC6 about 20 years ago and I need to port it to visual studio.net. I simply took the functions as I had written them along with the header files corrected all the compile errors (which were MANY) and then tried to load it. got "loaderlock closed" I then decided to redo all the files starting from those that few other files depended upon and then worked my way up and as I went I included only the header files that that particular file required. The result it loads now just fine. no more loaderlock closed. the lesson for me is don't include any more header files in a particular cpp file than is absolutely necessary. hope this helps

from a very happy camper!!

david

Upvotes: 0

Joshua

Reputation: 43317

I don't see any reason why you couldn't load an extra instance of the ActiveX control in your startup code and just hang onto the reference. Presto, no more loader lock issues since the VB6 runtime never shuts down.

Upvotes: 1

Mike Spross

Reputation: 8109

Since I'm still exploring my options, I wanted to still see if I could implement a solution in pure .NET code without using any native code, for the sake of simplicity. I'm not sure if this is a fool-proof solution yet, because I'm still trying to figure out whether it actually gives me the mutual exclusion I need, or if it just looks like it does.

Any thoughts or comments are welcome.

The relevant part of the code is below. Some notes:

The HandleRpcRequest method is called from a thread-pool thread when a new message is received from a remote client
This fires off a separate STA thread so that it can make the COM call safely
DbRequestProxy is a thin wrapper class around the real COM class I'm using
I used a ManualResetEvent (_safeForNewThread) to provide the mutual exclusion. The basic idea is that this event stays unsignaled (blocking other threads) if any one particular thread is about to exit (and hence potentially about to terminate the VB6 runtime). The event is only signaled again after the current thread completely terminates (after the Join call finishes). This way multiple request-handler threads can still execute concurrently unless an existing thread is exiting.

So far, I think this code is correct and guarantees that two threads can't deadlock in the VB6 runtime initialization/termination code anymore, while still allowing them to execute concurrently for most of their execution time, but I could be missing something here.

public class ClientHandler {

    private static ManualResetEvent _safeForNewThread = new ManualResetEvent(true);

    private void HandleRpcRequest(string request)
    {

        Thread rpcThread = new Thread(delegate()
        {
            DbRequestProxy dbRequest = null;

            try
            {
                Thread.BeginThreadAffinity();

                string response = null;

                // Creates a COM object. The VB6 runtime initializes itself here.
                // Other threads can be executing here at the same time without fear
                // of a deadlock, because the VB6 runtime lock is re-entrant.

                dbRequest = new DbRequestProxy();

                // Call the COM object
                response = dbRequest.ProcessDBRequest(request);

                // Send response back to client
                _messenger.Send(Messages.RpcResponse(response), true);
                }
            catch (Exception ex)
            {
                _messenger.Send(Messages.Error(ex.ToString()));
            }
            finally
            {
                if (dbRequest != null)
                {
                    // Force release of COM objects and VB6 globals
                    // to prevent a different deadlock scenario with VB6
                    // and the .NET garbage collector/finalizer threads
                    dbRequest.Dispose();
                }

                // Other request threads cannot start right now, because
                // we're exiting this thread, which will detach the VB6 runtime
                // when the underlying native thread exits

                _safeForNewThread.Reset();
                Thread.EndThreadAffinity();
            }
        });

        // Make sure we can start a new thread (i.e. another thread
        // isn't in the middle of exiting...)

        _safeForNewThread.WaitOne();

        // Put the thread into an STA, start it up, and wait for
        // it to end. If other requests come in, they'll get picked
        // up by other thread-pool threads, so we won't usually be blocking anyone
        // by doing this (although we are blocking a thread-pool thread, so
        // hopefully we don't block for *too* long).

        rpcThread.SetApartmentState(ApartmentState.STA);
        rpcThread.Start();
        rpcThread.Join();

        // Since we've joined the thread, we know at this point
        // that any DLL_THREAD_DETACH notifications have been handled
        // and that the underlying native thread has completely terminated.
        // Hence, other threads can safely be started.

        _safeForNewThread.Set();

    }
}

Upvotes: 0

Harry Johnston

Reputation: 36338

EDIT: in retrospect, I don't think this will work. The problem is that the deadlock can occur at any time that a Win32 thread exits, and since Win32 threads don't map 1:1 to .NET threads, we can't (within .NET) force Win32 threads to acquire the lock before exiting. In addition to the possibility of the .NET thread that is exiting being switched to a different OS thread, there are presumably OS threads not associated with any .NET thread (garbage collection and the like) which may start and exit at random.

The only other thing I can think of is to create my own lock object and surround the code that instantiates the COM object in a .NET lock block, but then I have no way (that I know of) to put the same lock around the (operating system's) thread exit code.

That sounds like a promising approach. I gather from this that you are able to modify the service's code, and you say each thread explicitly releases the COM object before exiting, so presumably you could claim a lock at this point, either just before explicitly releasing the COM object or just after. The secret is to choose a type of lock that is implicitly released once the thread holding it has exited, such as a Win32 mutex.

It is likely that a Win32 mutex object does not become abandoned until the thread has completed all DLL_THREAD_DETACH calls, although I don't know whether this behaviour is documented. I'm not familiar with locking in .NET but my guess is that they are unlikely to be suitable, because even if the right kind of lock exists, it would be likely to be considered abandoned as soon as the thread reaches the end of the managed code section, i.e., before the calls to DLL_THREAD_DETACH.

If Win32 mutex objects don't do the trick (or if you very reasonably prefer not to rely on undocumented behaviour) you might need to implement the lock yourself. One way to do this would be to use OpenThread to get a handle to the current thread and save this in your lock object, along with an event or similar object. If the lock has been claimed and you want to wait for it to be available, use WaitForMultipleObjects to wait until either the thread handle or the event is signaled. If the event is signaled this means the lock has been explicitly released, if the thread handle is signaled it was implicitly released by the thread exiting. Obviously implementing this involves a lot of tricky details (for example: when a thread explicitly releases the lock, you can't close the thread handle because another thread might be waiting on it, so you'll have to close it when the lock is next claimed instead) but it shouldn't be too difficult to sort these out.

Upvotes: 1

ogggre

Reputation: 2264

As long as all of your modules work in one process, you can hook Windows API by replacing some system calls with your wrappers. Then, you can wrap the calls in a single critical section to avoid deadlock.

There are several libraries and samples to achieve that, the technique is commonly known as detouring:

http://www.codeproject.com/Articles/30140/API-Hooking-with-MS-Detours

http://research.microsoft.com/en-us/projects/detours/

And of course the implementation of wrappers should be done in native code, preferably C++. .NET detours work too for high-level API functions such as MessageBox, but if you try to reimplement LoadLibrary API call in .NET then you may get a cyclic dependency issue because .NET runtime internally uses LoadLibrary function during execution and does this often.

So the solution looks like this to me: a separate .DLL module which is loaded at the very start of your application. The module fixes the deadlock problem by patching several VB and Windows API calls with your own wrappers. All wrappers do one thing: wrap the call in critical section and invoke the original API function to do the real job.

Upvotes: 2

Is there any way to work around OS loader lock deadlocks caused by third-party libraries?

Answers (5)

Related Questions