punter147
punter147

Reputation: 312

The number of times to run a profiling experiment

I am trying to profile a CUDA Application. I had a basic doubt about performance analysis and workload characterization of HPC programs. Let us say I want to analyse the wall clock time(the end-to-end time of execution of a program). How many times should one run the same experiment to account for the variation in the wall clock time measurement? Thanks.

Upvotes: 0

Views: 110

Answers (2)

mhopeng
mhopeng

Reputation: 1081

This kind of test demonstrates how well the compiled application interacts with the OS/computing environment where it will be used, as opposed to the efficiency of a specific algorithm or architecture. I do this kind of test by running the application three times in a row after a clean reboot/spinup. I'm looking for any differences caused by the OS loading and caching libraries or runtime environments on the first execution; and I expect the next two runtimes to be similar to each other (and faster than the first one). If they are not, then more investigation is needed.

Two further comments: it is difficult to be certain that you know what libraries and runtimes your application requires, and how a given computing environment will handle them, if you have a complex application with lots of dependencies. Also, I recommend avoiding specifying the application runtime for a customer, because it is very hard to control the customer's computing environment. Focus on the things you can control in your application: architecture, algorithms, library version.

Upvotes: 1

High Performance Mark
High Performance Mark

Reputation: 78364

How many times should one run the same experiment to account for the variation in the wall clock time measurement?

The question statement assumes that there will be a variation in execution time. Had the question been

How many times should one run CUDA code for performance analysis and workload characterization?

then I would have answered

Once.

Let me explain why ... and give you some reasons for disagreeing with me ...

Fundamentally, computers are deterministic and the execution of a program is deterministic. (Though, and see below, some programs can provide an impression of non-determinism but they do so deterministically unless equipped with exotic peripherals.)

So what might be the causes of a difference in execution times between two runs of the same program?

  1. Physics

Do the bits move faster between RAM and CPU as the temperature of the components varies? I haven't a clue but if they do I'm quite sure that within the usual temperature ranges at which computers operate the relative difference is going to be down in the nano- range. I think any other differences arising from the physics of computation are going to be similarly utterly negligible. Only lesson here, perhaps, is don't do performance analysis on a program which only takes a microsecond or two to execute.

Note that I ignore, for the purposes of this answer, the capability of some processors to adjust their clock rates in response to their temperature. This would have some (possibly large) impact on a program's execution time, but all you'd learn is how to use it as a thermometer.

  1. Contention for System Resources

By which I mean matters such as other processes (including the operating system) running on the same CPU / core, other traffic on the memory bus, other processes using I/O, etc. Sure, yes, these may have a major impact on a program's execution time. But what do variations in run times between runs of your program tell you in these cases? They tell you how busy the system was doing other work at the same time. And make it very difficult to analyse your program's performance.

A lesson here is to run your program on an otherwise quiet machine. Indeed one of the characteristics of the management of HPC systems in general is that they aim to provide a quiet platform to provide a reliable run time to user codes.

Another lesson is to avoid including in your measurement of execution time the time taken for operations, such as disk reads and writes or network communications, over which you have no control.

If your program is a heavy user of, say, disks, then you should probably be measuring i/o rates using one of the standard benchmarking codes for the purpose to get a clear idea of the potential impact on your program.

  1. Program Features

There may be aspects of your program which can reasonably be expected to produce different times from one run to the next. For example, if your program relies on randomness then different rolls of the dice might have some impact on execution time. (In this case you might want to run the program more than once to see how sensitive it is to the operations of the RNG.)

However, I exclude from this third source of variability the running of the code with different inputs or parameters. If you want to measure the scalability of program execution time wrt input size then you surely will have to run the program a number of times.

  1. In conclusion

There is very little of interest to be learned, about a program, by running it more than once with no differences in the work it is doing from one run to the next.

And yes, in my early days I was guilty of running the same program multiple times to see how the execution time varied. I learned that it didn't, and that's where I got this answer from.

Upvotes: 1

Related Questions