Reputation: 2629
I have been using gprof
to benchmark a code but since I started parallelizing it I realize that gprof
doesn't give useful output.
How do I profile or do something that can help find the bottlenecks?
I've heard of Scalasca
and TAU
but they seem a bit overkill.
Upvotes: 2
Views: 1634
Reputation: 1750
If you have access to Intel commercial software there are a couple of very useful tools. With Intel Vtune amplifier you can examine the hotspots for a serial run as well as the effective usage of your cores (in the image a summary graph for a 24 cores openMP run)
Upvotes: 1
Reputation: 22670
The simplest to use tool is perf
. It can easily be installed on any Linux system and works fairly well with OpenMP or other threaded applications.
You can look at live performance simply by running sudo perf top
on your command line. This will tell you the functions and source code lines that are currently consuming the most CPU resources like top
does for whole processes.
Prefixing your application with:
perf record ./your-program your-parameter
perf report
Will present a profile on a function and source code line basis. There are many parameters to tune perf
, e.g. by enabling call graph tracing with -g
.
However, while threads are supported you cannot easily distinguish them. So you won't know which thread showed what performance characteristics. For that you should resort to more specialized HPC tools, even if they seem a bit overkill. You have to consider that analyzing parallel performance is not simple. No matter what tool you use.
Free tools would be:
Upvotes: 7