Amruth
Amruth

Reputation: 1

Error in MPI broadcast

Sorry for the long post. I did read some other MPI broadcast related errors but I couldn't find out why my program is failing. I am new to MPI and I am facing this problem. First I will explain what I am trying to do:

My declarations : ROWTAG 400

COLUMNTAG 800

  1. Create a 2 X 2 Cartesian topology.
  2. Rank 0 has the whole matrix. It wants to dissipate parts of matrix to all the processes in the 2 X 2 Cartesian topology. For now, instead of matrix I am just dealing with integers. So for process P(i,j) in 2 X 2 Cartesian topology, (i - row , j - column), I want it to receive (ROWTAG + i ) in one message and (COLUMNTAG + j) in another message.
  3. My strategy to do so is: Processes: P(0,0) , P(0,1), P(1,0), P(1,1)

P(0,0) has all the initial data.

P(0,0) sends (ROWTAG+1) (in this case 401) to P(1,0) - In essense P(1,0) is responsible for dissipating information related to row 1 for all the processes in Row 1 - I just used a blocking send

P(0,0) sends (COLUMNTAG+1) (in this case 801) to P(0,1) - In essense P(0,1) is responsible for dissipating information related to column 1 for all the processes in Column 1 - Used a blocking send

For each process, I made a row_group containing all the processes in that row and out of this created a row_comm (communicator object)

For each process, I made a col_group containing all the processes in that column and out of this created a col_comm (communicator object)

At this point, P(0,0) has given information related to row 'i' to Process P(i,0) and P(0,0) has given information related to column 'j' to P(0,j). I call P(i,0) and P(0,j) as row_head and col_head respectively.

For Process P(i,j) , P(i,0) gives information related to row i, and P(0,j) gives information related to column j.

I used a broad cast call:

MPI_Bcast(&row_data,1,MPI_INT,row_head,row_comm)
MPI_Bcast(&col_data,1,MPI_INT,col_head,col_comm)

Please find my code here: http://pastebin.com/NpqRWaWN

Here is the error I see:

* An error occurred in MPI_Bcast on communicator MPI COMMUNICATOR 5 CREATE FROM 3 MPI_ERR_ROOT: invalid root * MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

Also please let me know if there is any better way to distribute the matrix data.

Upvotes: 0

Views: 2390

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74365

There are several errors in your program. First, row_Ranks is declared with one element less and when writing to it, you possibly overwrite other stack variables:

int col_Ranks[SIZE], row_Ranks[SIZE-1];
//                             ^^^^^^

On my test system the program just hangs because of that.

Second, you create new subcommunicators out of matrixComm but you use rank numbers from the latter to address processes in the former when performing the broadcast. That doesn't work. For example, in a 2x2 Cartesian communicator ranks range from 0 to 3. In any column- or row-wise subgroup there are only two processes with ranks 0 and 1 - there is neither rank 2 nor rank 3. If you take a look at the value of row_head across the ranks, it is 2 in two of them, hence the error.

For a much better way to distribute the data, you should refer to this extremely informative answer.

Upvotes: 1

Related Questions