How to use non-blocking point-to-point MPI routines instead of collectives

Question

In my programm, I would like to heavily parallelize many mathematical calculations, the results of which are then written to an output file.

I successfully implemented that using collective communication (gather, scatter etc.) but I noticed that using these synchronizing routines, the slowest among all processors dominates the execution time and heavily reduces overall computation time, as fast processors spend a lot of time waiting.

So I decided to switch to the scheme, where one (master) processor is dedicated to receiving chunks of results and handling the file output, and alle the other processors calculate these results and send them to the master using non-blocking send routines.

Unfortunately, I don't really know how to implement the master code; Do I need to run an infinite loop with MPI_Recv(), listening for incoming messages? How do I know when to stop the loop? Can I combine MPI_Isend() and MPI_Recv(), or do both method need to be non-blocking? How is this typically done?

Zulan · Accepted Answer

MPI 3.1 provides non-blocking collectives. I would strongly recommend that instead of implementing it on your own.

However, it may not help you after all. Eventually you need the data from all processes, even the slow ones. So you are likely to wait at some point again. Non-blocking communication overlaps communication and computation, but it doesn't fix your load imbalances.

Update (more or less a long clarification comment)

There are several layers to your question, I might have been confused by the title as to what kind of answer you were expecting. Maybe the question is rather

How do I implement a centralized work queue in MPI?

This pops up regularly, most recently here. But that is actually often undesirable because a central component quickly becomes a bottleneck in large scale programs. So the actual problem you have, is that your work decomposition & mapping is imbalanced. So the more fundamental "X-question" is

How do I load balance an MPI application?

At that point you must provide more information about your mathematical problem and it's current implementation. Preferably in form of an [mcve]. Again, there is no standard solution. Load balancing is a huge research area. It may even be a topic for CS.SE rather than SO.

How to use non-blocking point-to-point MPI routines instead of collectives

Answers (1)

Related Questions