MPI_Waitany is not waiting some processes

I am trying to use dynamic process creation with mpi but I am having problems to receive the response from child processes. I created a vector called sum_partial to store the result of each child. When I run with 4 processes (1 parent process + 3 children) this works ok. But when I run with more processes, sum_partial does not get the result of some children.

I'm not sure if the problem is the MPI_Irecv or MPI_Waitany. I've tried other approaches using MPI_Wait, MPI_Waitall, MPI_Test. However, the problem repeats itself.

Here is the MPI block of code I'm using in the parent process.

//Some code here...

for(j=0;j<num_Proc;j++){
    sprintf(argv[2], "%llu", vetIni[j+1]);
    sprintf(argv[3], "%llu", vetEnd[j+1]);
    MPI_Comm_spawn(bin, argv, 1, localInfo, 0, MPI_COMM_SELF, &intercommChild[j], err);
}

long long int sum_local=0, *sum_partial = calloc(num_Proc, sizeof(long long int));

for(j=0;j<num_Proc;j++)
    MPI_Irecv(&sum_partial[j], 1, MPI_LONG, 0, 99, intercommChild[j], &req[j]);

long long int ini = vetIni[0], end = vetEnd[0];

for(i=ini;i<end;i++)
    sum_local += i * (N-i); //Parent process do it's computation

for(j=0;j<num_Proc;j++){
    MPI_Waitany(num_Proc, req, &source, MPI_STATUS_IGNORE);
    sum_local += sum_partial[j]; //Sum all results
}

MPI_Finalize();

And here is the code that a child process runs.

//Some code here...

long long int ini = atol(argv[2]);
long long int end = atol(argv[3]);
long long int sum=0, i;

for(i=ini;i<end;i++)
    sum += i*(N-i);

MPI_Send(&sum, 1, MPI_LONG, 0, 99, intercommPai);

MPI_Finalize();

If I print the sum_partial running with 7 child, it will be like:

-8393498447644280608
4191132954560973024
0
0
-3708736119148578592
9184626552355719392
-903258050952161056

These zeros are not supposed to be there. The other results are right.

Can anyone identify what the problem is in my code?

Thank You.

Upvotes: 0

Views: 244

Answers (1)

Gilles Gouaillardet
Gilles Gouaillardet

Reputation: 8380

here is your loop on master

for(j=0;j<num_Proc;j++){
    MPI_Waitany(num_Proc, req, &source, MPI_STATUS_IGNORE);
    sum_local += sum_partial[j]; //Sum all results
}

so at iteration j you wait for data from any task, but then you implicitly assumes data from task j was received (e.g. you access sum_partial[j]).

you can either

for(j=0;j<num_Proc;j++){
    MPI_Wait(&req[j], MPI_STATUS_IGNORE);
    sum_local += sum_partial[j]; //Sum all results
}

or more likely

for(j=0;j<num_Proc;j++){
    MPI_Waitany(num_Proc, req, &source, MPI_STATUS_IGNORE);
    sum_local += sum_partial[source]; //Sum all results
}

just to be clear, you thought it did "work" with 4 tasks, but you were just lucky.

Upvotes: 2

Related Questions