Matt
Matt

Reputation: 27

I don't see what the issue is in my program in MPI

I don't know how to fix the problem with this program so far. The purpose of this program is to add up all the number in an array but I can only barely manage to send the arrays before errors start to appear. It has to do with the for loop in the if statement my_rank!=0 section.

#include <stdio.h>
#include <mpi.h>

int main(int argc, char* argv[]){
 int my_rank, p, source, dest, tag, total, n = 0;
 MPI_Status status;

 MPI_Init(&argc, &argv);
 MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
 MPI_Comm_size(MPI_COMM_WORLD, &p);

 //15 processors(1-15) not including processor 0
 if(my_rank != 0){
  MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
  int arr[n];
  MPI_Recv( arr, n, MPI_INT, source, tag, MPI_COMM_WORLD, &status);

  //printf("%i ", my_rank);
  int i;
  for(i = ((my_rank-1)*(n/15)); i < ((my_rank-1)+(n/15)); i++ ){
   //printf("%i ", arr[0]);
  }

 }
 else{
  printf("Please enter an integer:\n");
  scanf("%i", &n);

  int i;
  int arr[n];

  for(i = 0; i < n; i++){
   arr[i] = i + 1;
  }

  for(dest = 0; dest < p; dest++){
   MPI_Send( &n, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
   MPI_Send( arr, n, MPI_INT, dest, tag, MPI_COMM_WORLD);
  }
 }
 MPI_Finalize();
}

When I take that for loop out it compiles and run but when I put it back in it just stops working. Here is the error it is giving me:

[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank
[compute-0-24.local:1072] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Please enter an integer:
--------------------------------------------------------------------------
mpirun has exited due to process rank 8 with PID 1072 on
node compute-0-24 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-16.local][[31957,1],0][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.4.237 failed: Connection refused (111)
[cs-cluster:11677] 14 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[cs-cluster:11677] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Upvotes: 1

Views: 2915

Answers (2)

Hristo Iliev
Hristo Iliev

Reputation: 74475

Learn to read the informative error messages that Open MPI gives you and to apply some general debugging strategies.

[compute-0-24.local:1072] *** An error occurred in MPI_Recv
[compute-0-24.local:1072] *** on communicator MPI_COMM_WORLD
[compute-0-24.local:1072] *** MPI_ERR_RANK: invalid rank

The library is telling you that the receive operation was called with an invalid rank value. Armed with that knowledge, you take a look at your code:

int my_rank, p, source, dest, tag, total, n = 0;
...
//15 processors(1-15) not including processor 0
if(my_rank != 0){
  MPI_Recv( &n, 1, MPI_INT, source, tag, MPI_COMM_WORLD, &status);
  ...

The rank is source. source is an automatic variable declared some lines before but never initialised, therefore its initial value is completely random. You fix it by assigning source an initial value of 0 or by simply replacing it with 0 since you've already hard-coded the rank of the sender by singling out its code in the else block of the if operator.

The presence of the above error eventually hints you to examine the other variables too. Thus you notice that tag is also used uninitialised and you either initialise it to e.g. 0 or replace it altogether.

Now your program is almost correct. You notice that it seems to work fine for n up to about 33000 (the default eager limit of the self transport divided by sizeof(int)), but then it hangs for larger values. You either fire a debugger of simply add a printf statement before and after each send and receive operation and discover that already the first call to MPI_Send with dest equal to 0 never returns. You then take a closer look at your code and discover this:

for(dest = 0; dest < p; dest++){

dest starts from 0, but this is wrong since rank 0 is only sending data and not receiving. You fix it by setting the initial value to 1.

Your program should now work as intended (or at least for values of n that do not lead to stack overflow in int arr[n];). Congratulations! Now go and learn about MPI_Probe and MPI_Get_count, which will help you do the same without explicitly sending the length of the array first. Then learn about MPI_Scatter and MPI_Reduce, which will enable you to implement the algorithm even more elegantly.

Upvotes: 1

Gilles
Gilles

Reputation: 9519

There are two problems in the code you posted:

  1. The send loop starts from p=0, which means that process of rank zero will send to itself. However, since there's no receiving part for process zero, this won't work. Just make the loop to start from p=1 and that should solve it.
  2. The tag you use isn't initialised. So it's value can be whatever (which is OK), but can be a different whatever per process, which will lead to the various communications to never match each-other. Just initialise tag=0 for example, and that should fix that.

With this, your code snippet should work.

Upvotes: 1

Related Questions