MPI Fundamentals

Question

I have a basic question regarding MPI, to get a better understanding of it (I am new to MPI and multiple processes so please bear with me on this one). I am using a simulation environment in C++ (RepastHPC) that makes extensive use of MPI (using the Boost libraries) to allow parallel operations. In particular, the simulation consists of multiple instances of the respective classes (i.e. agents), that are supposed to interact with each other, exchange information etc. Now given that this takes place on multiple processes (and given my rudimentary understanding of MPI) the natural question or fear I have is, that agents on different processes don't intereact with each other anymore because they cannot connect (I know, this contradicts the entire idea of MPI).

After reading the manual my understanding is this: the available libraries of Boost.MPI (and also the libaries of the above mentionend package) take care of all of the communication and sending packages back and forth between processes, i.e. each process has copies of the instances from other processes (I guess this is some form of call by value, b/c the original instance cannot be changed from a process that has only a copy), then an updating takes place, to ensure that the copies of the instances have the same information as the originals and so on.

Does this mean, that in terms of the final outcomes of the simulations runs, I get the same as if I would be doing the entire thing on one process? Put differently, the multiple processes are just supposed to speed up things but not change the design of simulation (thus I don't have to worry about it)?

Wesley Bland · Accepted Answer

I think you have a fundamental misunderstanding of MPI here. MPI is not an automatic parallelization library. It isn't a distributed shared memory mechanism. It doesn't do any magic for you.

What it does do is make it simpler to communicate between different processes on the same or different machines. Each process has its own address space which does not overlap with the other processes (unless you're doing something else outside of MPI). Assuming you set up your MPI installation correctly, it will do all of the pain of setting up the communication channels between your processes for you. It also gives you some higher level abstractions like collective communication.

When you use MPI, you compile your code differently than normal. Instead of using g++ -o code code.cpp (or whatever your compiler is), you use mpicxx -o code code.cpp. This will automatically link with all of the MPI stuff necessary. Then when you run your application, you use mpiexec -n ./code (other arguments aren't required, but are probably necessary) . The argument num_processes will tell MPI how many processes to launch. This isn't done at compile/link time.

You will also have to rewrite your code to use MPI. MPI has lots of functions (the standard is available here and there are lots of tutorials available on the web that are easier to understand) that you can use. The basics are MPI_Send() and MPI_Recv(), but there's lots and lots more. You'll have to find a tutorial for that.

MPI Fundamentals

Answers (1)

Related Questions