Defcon97
Defcon97

Reputation: 121

MPI_Bcast not Bcasting

I am running an MPI application on 32 processes. The stdout of the rank 0 process tgets sent to a separate file for startup error logging, we will call this file STARTUP_ERROR while the stdout of all other processes is sent to a separate logfile.

I am running the Intel ditribution of MPI, version 2021.10.0.

My issue is really simple, i am trying to pass around 3 int's from rank 0, which reads them from a text file. I will show you the output from this code:

...

int nhb,nor,rnfreq;

if (myrank==0)
{
   find_section("HBC parameters");
   read_line("nhb","%d",&nhb);
   read_line("nor","%d",&nor);
   read_line("rnfreq","%d",&rnfreq);
}

printf("Rank %d: sees\t nhb= %d \t nor= %d \t rnfreq= %d\n", myrank, nhb, nor, rnfreq);
fflush(stdout);

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&nhb,1,MPI_INT,0,MPI_COMM_WORLD);
printf("Rank %d: sees\t nhb= %d \n", myrank, nhb);
fflush(stdout);

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&rnfreq,1,MPI_INT,0,MPI_COMM_WORLD);

printf("Rank %d: sees\t rnfreq= %d \n", myrank, rnfreq);

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&nor,1,MPI_INT,0,MPI_COMM_WORLD);

printf("Rank %d: sees\t nor= %d \n", myrank, nor);
MPI_Barrier(MPI_COMM_WORLD);

printf("Rank %d: sees\t nhb= %d \t nor= %d \t rnfreq= %d\n", myrank, nhb, nor, rnfreq);

hbc=set_hbc_parms(nhb,nor,rnfreq);
...

Before you scream at me, i know i don't actually need all those barriers and flushes, it's just that i have been fiddling with it for a while trying to understand what's wrong.

If i run this then on the file with the rank 0 output i get:

Rank 0: sees     nhb= 10     nor= 20     rnfreq= 20
Rank 0: sees     nhb= 10 
Rank 0: sees     rnfreq= 20 
Rank 0: sees     nor= 20 
Rank 0: sees     nhb= 10     nor= 20     rnfreq= 20

which is perfectly fine. The other processes however:

Rank 14: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 18: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 24: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 30: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 11: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 15: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 17: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 29: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 1: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 2: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 4: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 5: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 6: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 7: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 10: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 13: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 16: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 20: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 21: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 22: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 23: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 26: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 27: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 31: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 3: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 8: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 9: sees     nhb= 0      nor= 0      rnfreq= 0
Rank 12: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 19: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 1: sees     nhb= 1 
Rank 1: sees     rnfreq= 4936495 
Rank 2: sees     nhb= 1 
Rank 2: sees     rnfreq= 4936495 
Rank 3: sees     nhb= 1 
Rank 3: sees     rnfreq= 4936495 
Rank 4: sees     nhb= 1 
Rank 5: sees     nhb= 1 
Rank 5: sees     rnfreq= 4936495 
Rank 6: sees     nhb= 1 
Rank 7: sees     nhb= 1 
Rank 8: sees     nhb= 1 
Rank 8: sees     rnfreq= 4936495 
Rank 9: sees     nhb= 1 
Rank 9: sees     rnfreq= 4936495 
Rank 10: sees    nhb= 1 
Rank 10: sees    rnfreq= 4936495 
Rank 11: sees    nhb= 1 
Rank 11: sees    rnfreq= 4936495 
Rank 12: sees    nhb= 1 
Rank 12: sees    rnfreq= 4936495 
Rank 13: sees    nhb= 1 
Rank 14: sees    nhb= 1 
Rank 14: sees    rnfreq= 4936495 
Rank 15: sees    nhb= 1 
Rank 15: sees    rnfreq= 4936495 
Rank 16: sees    nhb= 1 
Rank 16: sees    rnfreq= 4936495 
Rank 17: sees    nhb= 1 
Rank 17: sees    rnfreq= 4936495 
Rank 18: sees    nhb= 1 
Rank 18: sees    rnfreq= 4936495 
Rank 19: sees    nhb= 1 
Rank 19: sees    rnfreq= 4936495 
Rank 20: sees    nhb= 1 
Rank 20: sees    rnfreq= 4936495 
Rank 21: sees    nhb= 1 
Rank 22: sees    nhb= 1 
Rank 22: sees    rnfreq= 4936495 
Rank 23: sees    nhb= 1 
Rank 23: sees    rnfreq= 4936495 
Rank 24: sees    nhb= 1 
Rank 24: sees    rnfreq= 4936495 
Rank 25: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 25: sees    nhb= 1 
Rank 26: sees    nhb= 1 
Rank 26: sees    rnfreq= 4936495 
Rank 27: sees    nhb= 1 
Rank 28: sees    nhb= 0      nor= 0      rnfreq= 0
Rank 28: sees    nhb= 1 
Rank 28: sees    rnfreq= 4936495 
Rank 29: sees    nhb= 1 
Rank 29: sees    rnfreq= 4936495 
Rank 30: sees    nhb= 1 
Rank 30: sees    rnfreq= 4936495 
Rank 31: sees    nhb= 1 
Rank 31: sees    rnfreq= 4936495 
Rank 4: sees     rnfreq= 4936495 
Rank 7: sees     rnfreq= 4936495 
Rank 13: sees    rnfreq= 4936495 
Rank 21: sees    rnfreq= 4936495 
Rank 25: sees    rnfreq= 4936495 
Rank 27: sees    rnfreq= 4936495 
Rank 6: sees     rnfreq= 4936495 
Rank 8: sees     nor= 10 
Rank 9: sees     nor= 10 
Rank 15: sees    nor= 10 
Rank 18: sees    nor= 10 
Rank 1: sees     nor= 10 
Rank 4: sees     nor= 10 
Rank 5: sees     nor= 10 
Rank 7: sees     nor= 10 
Rank 10: sees    nor= 10 
Rank 12: sees    nor= 10 
Rank 14: sees    nor= 10 
Rank 16: sees    nor= 10 
Rank 19: sees    nor= 10 
Rank 22: sees    nor= 10 
Rank 23: sees    nor= 10 
Rank 24: sees    nor= 10 
Rank 25: sees    nor= 10 
Rank 2: sees     nor= 10 
Rank 3: sees     nor= 10 
Rank 6: sees     nor= 10 
Rank 11: sees    nor= 10 
Rank 13: sees    nor= 10 
Rank 17: sees    nor= 10 
Rank 20: sees    nor= 10 
Rank 21: sees    nor= 10 
Rank 26: sees    nor= 10 
Rank 28: sees    nor= 10 
Rank 29: sees    nor= 10 
Rank 30: sees    nor= 10 
Rank 31: sees    nor= 10 
Rank 27: sees    nor= 10 
Rank 1: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 2: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 4: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 5: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 8: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 9: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 10: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 11: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 12: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 13: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 14: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 16: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 17: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 18: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 20: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 21: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 24: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 25: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 26: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 27: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 28: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 29: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 30: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 31: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 3: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 15: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 19: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 6: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 7: sees     nhb= 1      nor= 10     rnfreq= 4936495
Rank 22: sees    nhb= 1      nor= 10     rnfreq= 4936495
Rank 23: sees    nhb= 1      nor= 10     rnfreq= 4936495

So for some reason that i can't really wrap my head around the integers get completely changed by Bcast and I can't understand why. This prompts the error:

Error in set_hbc_parms [hbc_parms.c] (error no=1):
Parameters are not global
Program aborted

Which means that when he subsequently passes the numbers around in this other function, using MPI_Bcast as well, (i won't post the code of this as to not make the question super long but if you request i can provide) he can see that the numbers are actually different from the nonsensical gibberish he broadcasted earlier.

What is wrong?

Upvotes: 1

Views: 34

Answers (0)

Related Questions