Reputation: 39
We have a server application that communicates with clients via TCP sockets. After it runs for a few weeks it crashes with an NullReferenceException that can not be handled. I have been able to reproduce the exception with a very small console program, but it seems that there is unhandled exception in internal sockets threadpool. So I can not handle it with any try/catch blocks as it is not in my control.
Does anybody have any idea about this? Is it a framework bug or how can I catch the exception on the socket threadpool (so our application is not crashing) ? Here is the example code that is generating the exception, after a few iterations (3-10). It is important to know that the server is offline, so the socket is not being able to connect. It is used Visual studio 2010 and .Net framework 4.0.
internal class Program
{
private static string host;
private static Socket socket;
private static void Main(string[] args)
{
Trace.Listeners.Add(new ConsoleTraceListener());
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);
socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
host = "127.0.0.1";
//aslo the problem is happening whe the host is other network ip address
//host = "192.168.0.1";
//when in other thread doesn not crash application
//Task.Factory.StartNew(() => StartConnecting());
//also crashing the application
//Task.Factory.StartNew(() => StartConnecting(), TaskCreationOptions.LongRunning);
//when it is regular thread the exception occurs
///*
var thread = new Thread(new ThreadStart(StartConnecting));
thread.Start();
//*/
//when it is blocking exception also occurs
//StartConnecting();
Console.WriteLine("Press any key to exit ...");
Console.ReadKey();
}
private static void StartConnecting()
{
try
{
int count = 0;
while (true)
{
try
{
// if i must switch to Socket.Connect(...)?
Trace.WriteLine(string.Format("Connect Try {0} begin", ++count));
var ar = socket.BeginConnect(host, 6500, new AsyncCallback(ConnectCallback), socket);
Trace.WriteLine(string.Format("Connect Try {0} end", count));
}
catch (Exception err)
{
Trace.WriteLine(string.Format("[BeginConnect] error {0}", err.ToString()));
}
System.Threading.Thread.Sleep(1000);
//will see the exception more quick
}
}
catch (Exception e)
{
Trace.WriteLine(string.Format("[StartConnecting] error {0}", e.ToString()));
}
}
private static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)
{
string msg = e.ExceptionObject.ToString();
Trace.WriteLine(string.Format("[CurrentDomain_UnhandledException] isTerminating={0} error {1}", e.IsTerminating, msg));
Trace.WriteLine("Exiting process");
//the other processing threads continue working
//without problems untill there is thread.sleep
//Thread.Sleep(10000);
}
private static void ConnectCallback(IAsyncResult ar)
{
try
{
Trace.WriteLine("[ConnectCallback] enter");
var socket = (Socket)ar.AsyncState;
socket.EndConnect(ar);
Trace.WriteLine("[ConnectCallback] exit");
}
catch (Exception e)
{
Trace.WriteLine(string.Format("[ConnectCallback] error {0}", e.ToString()));
}
}
}
After the application starts the inevitable crash will occur:
[CurrentDomain_UnhandledException] isTerminating=True error System.NullReferenceException: Object reference not set to an instance of an object.
at System.Net.Sockets.Socket.ConnectCallback()
at System.Net.Sockets.Socket.RegisteredWaitCallback(Object state, Boolean timedOut)
at System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(Object state, Boolean timedOut)
Upvotes: 3
Views: 3195
Reputation: 138925
I'm pretty confident this uncatchable error is caused by a bug in the Socket code and you should report it to connect.
Here is an extract from the Socket.cs code at .NET reference source: http://referencesource.microsoft.com/#System/net/System/Net/Sockets/Socket.cs,938ed6a18154d0fc
private void ConnectCallback()
{
LazyAsyncResult asyncResult = (LazyAsyncResult) m_AcceptQueueOrConnectResult;
// If we came here due to a ---- between BeginConnect and Dispose
if (asyncResult.InternalPeekCompleted)
{
// etc.
return;
}
}
This callback is called by another static method:
private static void RegisteredWaitCallback(object state, bool timedOut)
{
Socket me = (Socket)state;
// Interlocked to avoid a race condition with DoBeginConnect
if (Interlocked.Exchange(ref me.m_RegisteredWait, null) != null)
{
switch (me.m_BlockEventBits)
{
case AsyncEventBits.FdConnect:
me.ConnectCallback();
break;
case AsyncEventBits.FdAccept:
me.AcceptCallback(null);
break;
}
}
}
This static method is never unregistered, it's always called, but it relies on a m_RegisteredWait
event to determine if it must pass on to the socket member method.
The problem is I suppose this event is sometimes not null while the m_AcceptQueueOrConnectResult
can be null, which causes the problem, in an uncatchable thread.
That being said, the root cause of the problem is the fact that your code exhibits problems in the first place as others have noted. To avoid this horrible uncatchable error, just make sure you call Close
or Dispose
on the socket when error happens and this will internally clear the m_RegisteredWait
member. For example, the BeginConnect documentation says this:
To cancel a pending call to the BeginConnect method, close the Socket. When the Close method is called while an asynchronous operation is in progress, the callback provided to the BeginConnect method is called. A subsequent call to the EndConnect method will throw an ObjectDisposedException to indicate that the operation has been cancelled.
In your example, just add the following line to your callback code:
private static void ConnectCallback(IAsyncResult ar)
{
try
{
...
}
catch (Exception e)
{
if (_socket != null) _socket.Dispose();
}
}
Now, you'll still have errors but they will be normal errors.
Upvotes: 1
Reputation: 161773
If you look carefully at the stack trace, you'll see that the NullReferenceException
occurs in System.Net.Sockets.Socket.ConnectCallback
. If you look at your code, you'll see that you have a method named ConnectCallback
.
That's what we call a "coincidence".
Please change the name of your callback method to MyConnectCallback
, and change the BeginConnect
call to:
var ar = socket.BeginConnect(host, 6500, new AsyncCallback(MyConnectCallback), socket);
See if that changes anything.
If I'm correct, and your ConnectCallback
method is never called, then I'm also forced to wonder how your code works at all.
Upvotes: 0
Reputation: 14521
The sample code you provided repeatedly calls BeginConnect
without waiting for the async operation to complete.
Roughly, you're doing that
while(true)
{
socket.BeginConnect(...);
Sleep(1000);
}
So when your thread starts it first calls BeginConnect()
, then wait one second, then call BeginConnect()
again while the previous call is still executing.
On my computer, it gives me an InvalidOperationException
, but I guess the exception type may depend on the CLR version (I'm using .NET 4.5.1).
Here are 3 different solutions:
Socket.EndConnect()
IAsyncResult.AsyncWaitHandle.WaitOne()
BeginConnect()
and use Connect()
insteadUpvotes: 1