Questions About the MPI IOs

Question

As for mpi programming, when should I use the collective operation for io? When should I use the shared file pointer IO operations over the individual file pointer IO ones?

MakisH · Accepted Answer

The collective MPI I/O is usually more optimized and you may prefer it when you have regular I/O points in your code, that all the processes are reaching at the same time. It can use fewer processes to do the actual writing (e.g. one per node) to write fewer but bigger chunks of data, to minimize the overhead. It may also start gathering the data before the actual writting.

For example, if you have a nicely decomposed domain for your problem, and you want to write your updated values at the end of each timestep, this is a good choice.

The collective operations are noted by the _all part at the name and the "opposite" of them are the single task operations (without the _all) which are independent of the process (e.g. you may have some processes writing different data than others). All of them have both a blocking and a non-blocking version. Keep in mind that "collective" doesn't imply "blocking".

As you already found out, both the single task and the collective operations exist in an "individual file pointer" version (the simplest), an "explicit offset" version (_at) and a "shared file pointer" (_shared (single task) or _ordered (collective)).

You may use individual file pointers when you want to write a different file within each process. This can be better when you have a lot of data per process to write, as well as many nodes and it is better to write them in a local manner, to reduce the bandwidth. I don't know in which scenarios and filesystems exactly this may be useful, but keep in mind that in "normal" problems it is usually better to have few, big datastreams rather than many, small ones, to reduce the overhead. You may also have some post-processing reasons for this or simply not all your processes are writting the same kind of data.

When talking about the same file:

You may use the explicit offset to point each process to a different point in your file.

You may use a shared pointer version mainly when you work with groups of processes. So, each process can start from the shared pointer as a reference and write at its appropriate location after it.

Keep in mind that the pointer is also connected with the file view. But this is another big topic.

Questions About the MPI IOs

Answers (1)

Related Questions