user19470144
user19470144

Reputation:

Why does the "MPI_Comm_split" funtion always fail to split them into different subcommunicators?

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
//Please run this program with 4 processes
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
    // Check that 4 MPI processes are used
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
    if (comm_size != 4)
    {
        printf("This application is meant to be run with 4 MPI processes, not %d.\n", comm_size);
        MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
    }
    // Get my rank in the global communicator
    int my_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    // Determine the colour and key based on whether my rank is even.
    int colour;
    int key;
    if (my_rank % 2 == 0)
    {
        colour = 0;
        key = my_rank;
    }
    else
    {
        colour = 1;
        key = comm_size - my_rank;
    }
    // Split the global communicator
    MPI_Comm new_comm;
    MPI_Comm_split(MPI_COMM_WORLD, colour, key, &new_comm);
    // Get my rank in the new communicator
    int my_new_comm_rank;
    MPI_Comm_rank(new_comm, &my_new_comm_rank);
    // Print my new rank and new communicator
    printf("I am process %d and I belong to %x\n", my_rank,new_comm);
    MPI_Finalize();
    return EXIT_SUCCESS;
}

The code above is supposed to divide 4 processes into 2 different subcommunicators, with processes 0 and 2 in one, and processes 1 and 3 in the other. However the output of this program is :

I am process 3 and I belong to 84000000
I am process 1 and I belong to 84000000
I am process 2 and I belong to 84000000
I am process 0 and I belong to 84000000

What doesn't make any sense is that they all belong to the same subcommunicator(84000000). It seems it fails to split them into different subcommunicators. By the way, I run this in Windows OS with MSMPI.

Upvotes: 1

Views: 153

Answers (1)

Victor Eijkhout
Victor Eijkhout

Reputation: 5794

You are thinking in shared memory terms. MPI uses distributed memory: each process has its own address space. Thus, address 84000000 on one process is a completely different object from the same address on another process. It's pure coicidence if they have the same address.

So you might wonder, how can I test if these subcommunicators are indeed the same? And the answer is: you can't. If two processes are in different communicators, they can't even see the other one. Think about it: how would you have a handle to a communicator that you are not in?

Upvotes: 3

Related Questions