Reputation: 710
Is it a good idea to enable application insights profiler in production azure web app all the time? Or should we enable only when we have any performance issues?
I tried to enable it in production, but do not know whether keeping the profiler on will have any impact on our application.
Upvotes: 2
Views: 870
Reputation: 20445
EDIT 5/24/24: The team at MS that's responsible for this got in touch with me (!!) and they're working the issue. It's apparently really definitely not supposed to work this way. I'll update this post as I get more information.
EDIT 6/20/24: The CLR team has opened this issue here to help address the problem: https://github.com/dotnet/runtime/issues/102858. Their initial diagnosis is that we were generating a very large number of dynamic Entity Framework queries, which was resulting in a huge number of methods being emitted (2 million+), all of which needed to be jitted; and all those methods hitting the JIT were resulting in the system locking during the "rundown" that was required as part of the live profiling.
I should mention that although we've removed as many of the dynamic EF query generations as we could find, the problem still remains - we've tried re-enabling the profiler several times, and the problem returns immediately. So we may just need to wait for the CLR fix.
DO NOT ENABLE APPLICATION INSIGHTS PROFILER SAMPLING IN PRODUCTION.
We were having some small but annoying performance issues with our website, so we enabled Application Insights, and just accepted all the standard defaults, including periodic profiling once an hour or so. The documentation said it would have a small 10-15% CPU impact, and CPU was not even remotely a problem, so that seemed like an easy trade-off.
Over the next couple months, our traffic grew substantially, and our performance issues grew substantially more severe. According to what we could see from Application Insights, it seemed the most likely candidate was thread pool starvation, so we spent weeks hunting through our app for places where we might be locking threads or simply not using async
patterns, to no avail. Indeed, things continued to get worse.
Finally, on a whim, I decided to disable the Application Insights profiler sampling.
Immediately, our problems went away. All the weird "the website is taking 10 minutes to respond to a simple API call" problems ... just disappeared.
This is active threads and thread pool queue length on our webservers, with the red oval where we disabled profiler sampling.
I learned my lesson.
DO NOT ENABLE APPLICATION INSIGHTS PROFILER SAMPLING IN PRODUCTION.
Upvotes: 5
Reputation: 29770
[..] but do not know weather keeping profiler on will have any -ve impact on application
It does, according to the docs:
When Profiler is running, it adds from 5 percent to 15 percent CPU overhead to the server.
It is up to you to decide whether this is acceptable or not.
I wouldn't personally run it always. Only when there are performance issues. You might one to run in once in a while to check proactive for issues before those issues become a real problem.
Upvotes: 1