Reputation: 5973
I'm working on a status monitoring program for my company and we need to be able to know if a process on a remote machine is hanging due to not responding (that I can do) or if it's thrown an unhandled exception. I've tried a number of things from checking the CPU usage of the process to checking if the process has gone idle. Nothing has really been reliable. Is there anyway to do this? All the programs we'll be monitoring are running on either .NET 3.5 or 4.0.
Upvotes: 3
Views: 705
Reputation: 5973
Thanks for the answers guys, I've got a decent work around for what I need:
After researching a bit, I found out that a .NET program runs 4 threads on startup. None of the programs that I'm monitoring here open any new threads. I also found out after some testing that the exception message is opened on a new thread and causes the CPU usage for the program to drop to 0. So, I'm checking for both of these conditions:
if CPU = 0 and threadCount > 4
Chances are, an exception was thrown. Because why else would the program be sitting there with 0 CPU and more than 4 threads? As far as I see right now, there isn't any other reason.
Upvotes: 1
Reputation: 6876
You might try monitoring the event log on the system that the process is running on. When a process throws and exception that is unhanded an entry is made here with the process and some details on the error.
There are a few other questions here on the stack that deal with this same question.
catch another process unhandled exception
How to catch exceptions from processes in C#
Upvotes: 1
Reputation: 141
Would checking the Event Log (System/Security) help you with the Source of the event as the process you are monitoring?
Upvotes: 1
Reputation: 44941
The answer will be much easier if you have control over the source of those applications.
The way that we handle this scenario is that the applications to be monitored periodically write their status to a central database. The monitoring application then checks the status in these tables on a regular basis and, if the status isn't updated for a specific period of time (i.e. 2 minutes), alerts are generated.
This helps us identify not only application problems, but also connectivity problems and unexpected machine reboots.
Upvotes: 1