empty
empty

Reputation: 5434

mpirun performance analysis

I'm running mpirun (OpenMPI) with 86 processes on 12 CPUs and 2 GPUs on Ubuntu 18.04. The application that is being run is training neural networks.

After a day or so of training the iterations slow down dramatically. The code works fine on a single thread, network traffic (file reads) are well within spec and the CPUs and GPUs show no excessive load.

So I think that problem is with the mpirun.

Are there non-intrusive tools available to show the performance of the MPI runs? I've been looking at Performance Co-Pilot but I don't see any MPI profiling in the software itself.

Upvotes: 1

Views: 264

Answers (1)

anegru
anegru

Reputation: 1123

Callgrind and kcachegrind might be useful. A brief look here [1] may help you as well.

[1] https://www.open-mpi.org/faq/?category=debugging#parallel-debuggers

Upvotes: 1

Related Questions