patentfox
patentfox

Reputation: 1464

Profiling multithreaded code, how does sampling work

I am using Visual Studio to profile my multi threaded C++ application. From what I have read about Sampling method, I understand it looks at processor at specified intervals to see what function is executing.

I am curious how it handles multi-threaded code. It is quite possible that 2 or more functions may be executing at once on different threads on different cores. In that case, does sampling method increment counter for both of those functions? I believe this is what actually happens.

This actually makes it difficult to derive insights out of profiling report. The function which has most number of collected samples, if executing on a worker thread, may be executing on a different core (than main thread) and may not impact application performance at all. But, if it is switching out main thread to perform its work, then it should have an observable impact on performance.

Is there a better way to profile multithreaded code?

Upvotes: 0

Views: 1411

Answers (2)

Mike Dunlavey
Mike Dunlavey

Reputation: 40709

Is there a better way to profile multithreaded code?

I always have to ask because these are not the same:
Are you looking for 1) what is taking wall-clock time, and that could be corrected to speed up the application, or 2) measurements of various kinds like function call counts, CPU self time, CPU inclusive time, hot path, etc.?

Assuming the answer is 1, the method many people and I use is to simply pause the application, several times if necessary, under the Visual Studio IDE. When you do that, it pauses all threads. You can display the call stack for every thread. This shows you what it's waiting for, and why. One or more of the threads will, on some fraction of the pauses, be in the process of either some computation or some system wait or I/O that you might deem to be avoidable.

You could call it a "poor person's profiler", but here's how it goes beyond profiler output:

  • You don't have to care if the problem is in computation or I/O, or guess which it is and choose different profiling methods. Either way, you see it.

  • If you want to know the inclusive fraction of time spent in a function/method, roughly, it is the fraction of samples where the function is on the stack. The same goes for any line of code. If you want to know the exclusive (self) fraction, it is when the function or line of code is at the end of the stack.

  • If you want to know what fraction of time is spent with function A calling function B, it is the fraction of samples where A calls B. If you're interested in A calling B through an intermediary, you can also see that (which no call graph can tell you).

  • Suppose the stack is 30 levels deep, ending in some I/O, and you want to know what part of your code is causing it to do that, just scan up the stack, looking at each line of code until you find it. Note this is probably not the "hot path", because there may be multiple ways of getting to the problem code.

  • When you do this, you can not only see the responsible line(s) of code, you can examine the values of the relevant data variables. Profilers cannot show you these; you have to guess.

  • It doesn't waste your time by telling you lots of things are not problems because they take small percents. (Sometimes people think they are only looking for small things, like 5% or less, while making the rosy assumption there is nothing bigger. A profiler can lead one to make that assumption because with it you can't see anything bigger.)

  • It allows you to concentrate on the code you can do something about, your code, as opposed to system code.

  • You don't have to hunt through timelines to find the interval of interest. You pause it when it's making you wait; it's hard to pause it any other time. So it will tell you why it's making you wait.

Upvotes: 0

Alexey Alexandrov
Alexey Alexandrov

Reputation: 3129

When sampling function execution, profilers typically sample each software thread individually. So, if you have three threads, executing CPU-intensive foo(), bar() and baz() functions respectively, and the sampling frequency is 100Hz, and the duration of the profiling session is 1s, you would get 100 samples in each of the functions.

A decent profiler will also typically give you a way to filter the data by a given thread, so that you can see which hotspots are present on which thread in isolation. For example, what happens on the main thread may be of a big deal if the main thread is where the UI rendering is done with the framework you use.

Figuring out how computations done in background threads affect the application responsiveness is a broad topic by itself and is often application specific. Some patterns:

  • Look for where the main thread gets blocked. It may be blocking waiting for the result of background computation.
  • Look for whether the main thread had any points where it had to skip doing something because it didn't have the data readily available. This is specifically common to UI / rendering handling - if the data is not ready by the time a frame needs to be rendered, the code has nothing better to do as skipping rendering the frame causing user-visible jank in the UI.

Hope this helps.

Upvotes: 2

Related Questions