Reputation: 19213
Let's say I have a contrived program:
#include <Windows.h>
void useless_function()
{
Sleep(5000);
}
void useful_function()
{
// ... do some work
useless_function();
// ... do some more work
}
int main()
{
useful_function();
return 0;
}
Objective: I want the profiler to tell me useful_function()
is needlessly calling useless_function()
which waits for no obvious reasons. Under XPerf, this doesn't show up in any of the graphs I have because the call to WaitForMultipleObjects()
seem to be accounted to Idle.exe
instead of my own program.
And here's the xperf command line that I currently run:
xperf -on Latency -stackwalk Profile
Any ideas?
(This is not restricted to wait functions. The above might have been solved by placing breakpoints at NtWaitForMultipleObjects
. Ideally there could be a way to see the stack sample that's taking up a lot of wall-clock time as opposed to only CPU time)
Upvotes: 3
Views: 964
Reputation: 3482
Wait Analysis is the way to do this. You should:
Then you use CPU Usage (Precise) in WPA (or xperfview, but that's ancient) to look at the context switches and find where your TimeSinceLast is high on a thread that shouldn't be going idle. You'll typically want the columns in CPU Usage (Precise) in this sort of order:
For details see these particular articles from my blog: - https://randomascii.wordpress.com/2014/08/19/etw-training-videos-available-now/ - https://randomascii.wordpress.com/2012/06/19/wpaxperf-trace-analysis-reimagined/
Upvotes: 1
Reputation: 1294
I think what you are looking for is the Wait analysis with Ready Thread functionality in Xperf. It captures every context switch and gives you the call stack of the thread once it wakes up from sleep (or an otherwise blocked operation). In your case, you would see the stack just after the call sleep(5000) as well as the time spend sleeping.
The functionality is a bit obscure to use. But it is fortunately well described here:
Use Xperf's Wait Analysis for Application-Performance Troubleshooting
Upvotes: 2
Reputation: 40679
This "profiler" will tell you - just randomly pause it a few times and look at the stack. If do some work
takes 5 seconds, and do some more work
takes 5 seconds, then 33% of the time the stack will look like this
main: calling useful_function
useful_function: calling useless_function
useless_function: calling Sleep
So roughly 33% of your stack samples will show exactly that. Any line of code that's costing some fraction of wall-clock time will appear on roughly that fraction of samples.
On the rest of the samples you will see it doing the other things.
There are automated profilers that do the same thing in a more pretty way, such as Zoom and LTProf, although they don't actually show you the samples.
I looked at the xperf doc, trying to figure out if you could get stack samples on wall-clock time and get percents at line-level resolution. It seems you gotta be on Windows 7 or Vista. They only bother with functions, not lines, which if you have realistically big functions, is important. I couldn't figure out how to get access to the individual samples, which I think is important for seeing why the program is spending its time.
Upvotes: 0