How to organize data (writing your own profiler)

Question

I was thinking about using reflection to generate a profiler. Lets say I am generating code without a problem; how do I properly measure or organize the results? I'm mostly concerned about CPU time but memory suggestions are welcome

Mike Dunlavey · Accepted Answer

There are lots of bad ways to write profilers.

I wrote what I thought was a pretty good one over 20 years ago. That is, it made a decent demo, but when it came down to serious performance tuning I concluded there really is nothing that works better, and gives better results, than the dumb old manual method, and here's why.

Anyway, if you're writing a profiler, here's what I think it should do:

It should sample the stack at unpredictable times, and each stack sample should contain line number information, not just functions, in the code being tuned. It's not so important to have that in system functions you can't edit.
It should be able to sample during blocked time like I/O, sleeps, and locking, because those are just as likely to result in slowness as CPU operations.
It should have a hot-key that the user can use, to enable the sampling during the times they actually care about (like not when waiting for the user to do something).
Do not assume it is necessary to get measurement precision, necessitating a large number of frequent samples. This is incredibly basic, and it is a major reversal of common wisdom. The reason is simple - it doesn't do any good to measure problems if the price you pay is failure to find them. That's what happens with profilers - speedups hide from them, so the user is content with finding maybe one or two small speedups while giant ones get away. Giant speedups are the ones that take a large percentage of time, and the number of stack samples it takes to find them is inversely proportional to the time they take. If the program spends 30% of its time doing something avoidable, it takes (on average) 2/0.3 = 6.67 samples before it is seen twice, and that's enough to pinpoint it.

To answer your question, if the number of samples is small, it really doesn't matter how you store them. Print them to a file if you like - whatever. It doesn't have to be fast, because you don't sample while you're saving a sample.
What does allow those speedups to be found is when the user actually looks at and understands individual samples. Profilers have all kinds of UI - hot spots, call counts, hot paths, call graphs, call trees, flame graphs, phony 3-digit "statistics", blah, blah. Even if it's well done, that's only timing information. It doesn't tell you why the time is spent, and that's what you need to know. Make eye candy if you want, but let the user see the actual samples.

... and good luck.

ADDED: A sample looks something like this:

main:27, myFunc:16, otherFunc:9, ..., someFunc;132

That means main is at line 27, calling myFunc. myFunc is at line 16, calling otherFunc, and so on. At the end, it's in someFunc at line 132, not calling anything (or calling something you can't identify). No need for line ranges. (If you're tempted to worry about recursion - don't. If the same function shows up more than once in a sample, that's recursion. It doesn't affect anything.)

You don't need a lot of samples. When I did it, sampling was not automatic at all. I would just have the user press both shift keys simultaneously, and that would trigger a sample. So the user would grab like 10 or 20 samples, but it is crucial that the user take the samples during the phase of the program's execution that annoys the user with its slowness, like between the time some button is clicked and the time the UI responds. Another way is to have a hot-key that runs sampling on a timer while it is pressed. If the program is just a command-line app with no user input, it can just sample all the time while it executes. The frequency of sampling does not have to be fast. The goal is to get a moderate number of samples during the program phase that is subjectively slow. If you take too many samples to look at, then when you look at them you need to select some at random.

The thing to do when examining a sample is to look at each line of code in the sample so you can fully understand why the program was spending that instant of time. If it is doing something that might be avoided, and if you see a similar thing on another sample, you've found a speedup. How much of a speedup? This much (the math is here):

For example, if you look at three samples, and on two of them you see avoidable code, fixing it will give you a speedup - maybe less, maybe more, but on average 4x. (That's what I mean by giant speedup. The way you get it is by studying individual samples, not by measuring anything.)

There's a video here.

How to organize data (writing your own profiler)

Answers (1)

Related Questions