Profiling: Self time vs Total time for a simple line of code

Question

When profiling, we know that 'self time' means the time spent on the method ignoring the functions it calls, whereas 'total time' means the overall time of that method with its children functions called.

I have a simple function where i'm doing only computation on arrays and not calling any other function inside. However when profiled, the times for self and total are quite different, for the function itself and also for each line on the function. Here is a sample profiled line from Zoom Profiler, ran with 32 threads.

timing of the line in for loop

Total        Self         Code
29.4 sec     16.9 sec     id=*(pid);

What baffles me is that based on the first paragraph it should mean that if a line of code doesn't call any method, the self time and total times spent should be identical. Any ideas on what is happening?

Mike Dunlavey · Accepted Answer

I wouldn't worry too much about that.

The statistics that profilers show, even good profilers, have to be taken "with a grain of salt". What those stats imply is that on some stack samples containing that line of code, it was not the last line of code on the stack, and that's hard to believe. It's easier to believe the profiler is mistaken. These days, with the way processors have pipelines that run way ahead, it could be pretty confusing.

In any case, if the reason you are profiling is to find speedups, it's hard to see how to speed up that particular line of code. You're looking for stuff you can speed up, not stuff you can't.

BTW, I would recommend not looking at seconds or milliseconds or any absolute time measurement, but rather look at fractional time - time divided by total. Real speedups tend to account for larger amounts, from 5% to 99%. Furthermore, don't bother with self time - inclusive time includes it, and if the program is of any significant size, so the stack samples get deep, the non-terminal lines on the stack samples (i.e. function calls) are a rich hunting ground for speedups. Finally, measurement precision is not necessary, so a large number of samples is not necessary.

That's why many people use this technique.

EDIT: FWIW, here's what I mean. Here are 5 random-time stack samples:

---------------------------
... some calls above
... exp(4.02) ...
... some calls below
---------------------------
... some calls above
... exp(0.35) ...
... some calls below
---------------------------
... some calls above
... push_back(...) ...
...
... new ...
...
---------------------------
...
... myArray[...]
...
---------------------------
... some calls above
... exp(0.35) ...
... some calls below
---------------------------

Three of them have calls to exp on the stack, in your code, or code you can edit, quite possibly from different places in the code. That implies you are spending roughly 60% of time calling exp. You can't speed up that function, but could you call it less?

Now notice that the second and fifth samples call exp with the same numeric argument. (That's something no profiler could tell you.) What does that mean? It means roughly 40% of time is in exp calls with repeat arguments, which you could possibly avoid by remembering prior values, or by memoizing exp.

If you take samples until you see some pattern that you could do something about, and you see it two or more times, you've got a healthy speedup. The fewer samples you have to take to get to that point, the bigger the speedup. Then you can repeat the whole process, until you can't any more. That's a reliable way to get speed that's hard to beat.

Profiling: Self time vs Total time for a simple line of code

Answers (1)

Related Questions