bob.sacamento
bob.sacamento

Reputation: 6651

What would cause a process to hang in an MPI_BARRIER call?

I am trying to run an MPI Fortran code. It works fine until it reaches an MPI_BARRIER near the end of execution. I have confirmed with debug statements that all of the processes reach the call. However, quite often -- not always, but often -- a substantial portion of the processes never return from the call. They just hang.

Let me say that in the spirit of trying to present the simplest code that reproduces the problem, I wrote a short toy code that does nothing except call a barrier. It runs just fine. I have no idea, then, where the problem might be located in the rather large code I am running.

This seems to be a problem only with the Intel compiler and Intel MPI (rev. 2023.1.0). Other compilers don't have the same problem. Further, it doesn't show up at lower process counts. The problems begin somewhere between 64 procs (which always runs OK) and 96 procs (where I am seeing the problem). I want to emphasize that all the processes are making it to the barrier and calling it.

What could be causing this?

Upvotes: 0

Views: 106

Answers (0)

Related Questions