Reputation: 65
I have written an algorithm that solves a problem in MPI and been doing some testing with varying number of processes. Interestingly NP 2 performs better than NP 4 or NP 1 which probably has to do with my implementation. What I would like to do is have a measurement of communication costs in the simplest form, perhaps a counter that is incremented++?
My question would be at which place in the code would I place the counter? Whenever the program calls MPI_SEND?
PS: I am aware of mpiP but I believe it would be overkill for this small project.
Upvotes: 2
Views: 930
Reputation: 22670
The simplest way is to use a tool. Using a proper MPI performance analysis tool is not overkill but the correct and best way to go with this even for small projects.
Evidently you cannot provide a minimal example for us to look at, nor do you have a intuitive and correct understanding of the parallel execution of code - so it is most definitely large enough to benefit from using a proper tool.
Understanding parallel performance is difficult and you cannot do this based on local information only. So in any way you would have to create parallel data structures and introduce additional communication in your code to figure out your performance issue. All this makes your code unnecessarily complex and you might even introduce additional performance issues from naively adding manual analysis code.
Upvotes: 0
Reputation: 40614
The interesting part is the timing of your MPI calls, and the easiest way to measure the times spent in those calls is to use MPI_Wtime()
. So, just wrap your communication-heavy code between two calls to MPI_Wtime()
, and print the difference.
Background info:
All MPI calls are relatively costly functions since they have to content with the network latency, so it is wise not to use three MPI calls where one suffices. But these kinds of optimization should be clear from your code, no need to profile for that.
In most real-world programs, the bigger performance impact is from synchronization: Most MPI calls simply cannot complete until the communication partners have entered their respective calls. As such, if one process takes a millisecond longer than the other processes, all processes will typically be delayed by that millisecond. And these effects are only visible from the execution times of the individual MPI calls.
Upvotes: 3