Reputation: 121
I am running an MPI application on 32 processes.
The stdout
of the rank 0 process tgets sent to a separate file for startup error logging, we will call this file STARTUP_ERROR
while the stdout
of all other processes is sent to a separate logfile.
I am running the Intel ditribution of MPI, version 2021.10.0.
My issue is really simple, i am trying to pass around 3 int
's from rank 0, which reads them from a text file.
I will show you the output from this code:
...
int nhb,nor,rnfreq;
if (myrank==0)
{
find_section("HBC parameters");
read_line("nhb","%d",&nhb);
read_line("nor","%d",&nor);
read_line("rnfreq","%d",&rnfreq);
}
printf("Rank %d: sees\t nhb= %d \t nor= %d \t rnfreq= %d\n", myrank, nhb, nor, rnfreq);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&nhb,1,MPI_INT,0,MPI_COMM_WORLD);
printf("Rank %d: sees\t nhb= %d \n", myrank, nhb);
fflush(stdout);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&rnfreq,1,MPI_INT,0,MPI_COMM_WORLD);
printf("Rank %d: sees\t rnfreq= %d \n", myrank, rnfreq);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&nor,1,MPI_INT,0,MPI_COMM_WORLD);
printf("Rank %d: sees\t nor= %d \n", myrank, nor);
MPI_Barrier(MPI_COMM_WORLD);
printf("Rank %d: sees\t nhb= %d \t nor= %d \t rnfreq= %d\n", myrank, nhb, nor, rnfreq);
hbc=set_hbc_parms(nhb,nor,rnfreq);
...
Before you scream at me, i know i don't actually need all those barriers and flushes, it's just that i have been fiddling with it for a while trying to understand what's wrong.
If i run this then on the file with the rank 0 output i get:
Rank 0: sees nhb= 10 nor= 20 rnfreq= 20
Rank 0: sees nhb= 10
Rank 0: sees rnfreq= 20
Rank 0: sees nor= 20
Rank 0: sees nhb= 10 nor= 20 rnfreq= 20
which is perfectly fine. The other processes however:
Rank 14: sees nhb= 0 nor= 0 rnfreq= 0
Rank 18: sees nhb= 0 nor= 0 rnfreq= 0
Rank 24: sees nhb= 0 nor= 0 rnfreq= 0
Rank 30: sees nhb= 0 nor= 0 rnfreq= 0
Rank 11: sees nhb= 0 nor= 0 rnfreq= 0
Rank 15: sees nhb= 0 nor= 0 rnfreq= 0
Rank 17: sees nhb= 0 nor= 0 rnfreq= 0
Rank 29: sees nhb= 0 nor= 0 rnfreq= 0
Rank 1: sees nhb= 0 nor= 0 rnfreq= 0
Rank 2: sees nhb= 0 nor= 0 rnfreq= 0
Rank 4: sees nhb= 0 nor= 0 rnfreq= 0
Rank 5: sees nhb= 0 nor= 0 rnfreq= 0
Rank 6: sees nhb= 0 nor= 0 rnfreq= 0
Rank 7: sees nhb= 0 nor= 0 rnfreq= 0
Rank 10: sees nhb= 0 nor= 0 rnfreq= 0
Rank 13: sees nhb= 0 nor= 0 rnfreq= 0
Rank 16: sees nhb= 0 nor= 0 rnfreq= 0
Rank 20: sees nhb= 0 nor= 0 rnfreq= 0
Rank 21: sees nhb= 0 nor= 0 rnfreq= 0
Rank 22: sees nhb= 0 nor= 0 rnfreq= 0
Rank 23: sees nhb= 0 nor= 0 rnfreq= 0
Rank 26: sees nhb= 0 nor= 0 rnfreq= 0
Rank 27: sees nhb= 0 nor= 0 rnfreq= 0
Rank 31: sees nhb= 0 nor= 0 rnfreq= 0
Rank 3: sees nhb= 0 nor= 0 rnfreq= 0
Rank 8: sees nhb= 0 nor= 0 rnfreq= 0
Rank 9: sees nhb= 0 nor= 0 rnfreq= 0
Rank 12: sees nhb= 0 nor= 0 rnfreq= 0
Rank 19: sees nhb= 0 nor= 0 rnfreq= 0
Rank 1: sees nhb= 1
Rank 1: sees rnfreq= 4936495
Rank 2: sees nhb= 1
Rank 2: sees rnfreq= 4936495
Rank 3: sees nhb= 1
Rank 3: sees rnfreq= 4936495
Rank 4: sees nhb= 1
Rank 5: sees nhb= 1
Rank 5: sees rnfreq= 4936495
Rank 6: sees nhb= 1
Rank 7: sees nhb= 1
Rank 8: sees nhb= 1
Rank 8: sees rnfreq= 4936495
Rank 9: sees nhb= 1
Rank 9: sees rnfreq= 4936495
Rank 10: sees nhb= 1
Rank 10: sees rnfreq= 4936495
Rank 11: sees nhb= 1
Rank 11: sees rnfreq= 4936495
Rank 12: sees nhb= 1
Rank 12: sees rnfreq= 4936495
Rank 13: sees nhb= 1
Rank 14: sees nhb= 1
Rank 14: sees rnfreq= 4936495
Rank 15: sees nhb= 1
Rank 15: sees rnfreq= 4936495
Rank 16: sees nhb= 1
Rank 16: sees rnfreq= 4936495
Rank 17: sees nhb= 1
Rank 17: sees rnfreq= 4936495
Rank 18: sees nhb= 1
Rank 18: sees rnfreq= 4936495
Rank 19: sees nhb= 1
Rank 19: sees rnfreq= 4936495
Rank 20: sees nhb= 1
Rank 20: sees rnfreq= 4936495
Rank 21: sees nhb= 1
Rank 22: sees nhb= 1
Rank 22: sees rnfreq= 4936495
Rank 23: sees nhb= 1
Rank 23: sees rnfreq= 4936495
Rank 24: sees nhb= 1
Rank 24: sees rnfreq= 4936495
Rank 25: sees nhb= 0 nor= 0 rnfreq= 0
Rank 25: sees nhb= 1
Rank 26: sees nhb= 1
Rank 26: sees rnfreq= 4936495
Rank 27: sees nhb= 1
Rank 28: sees nhb= 0 nor= 0 rnfreq= 0
Rank 28: sees nhb= 1
Rank 28: sees rnfreq= 4936495
Rank 29: sees nhb= 1
Rank 29: sees rnfreq= 4936495
Rank 30: sees nhb= 1
Rank 30: sees rnfreq= 4936495
Rank 31: sees nhb= 1
Rank 31: sees rnfreq= 4936495
Rank 4: sees rnfreq= 4936495
Rank 7: sees rnfreq= 4936495
Rank 13: sees rnfreq= 4936495
Rank 21: sees rnfreq= 4936495
Rank 25: sees rnfreq= 4936495
Rank 27: sees rnfreq= 4936495
Rank 6: sees rnfreq= 4936495
Rank 8: sees nor= 10
Rank 9: sees nor= 10
Rank 15: sees nor= 10
Rank 18: sees nor= 10
Rank 1: sees nor= 10
Rank 4: sees nor= 10
Rank 5: sees nor= 10
Rank 7: sees nor= 10
Rank 10: sees nor= 10
Rank 12: sees nor= 10
Rank 14: sees nor= 10
Rank 16: sees nor= 10
Rank 19: sees nor= 10
Rank 22: sees nor= 10
Rank 23: sees nor= 10
Rank 24: sees nor= 10
Rank 25: sees nor= 10
Rank 2: sees nor= 10
Rank 3: sees nor= 10
Rank 6: sees nor= 10
Rank 11: sees nor= 10
Rank 13: sees nor= 10
Rank 17: sees nor= 10
Rank 20: sees nor= 10
Rank 21: sees nor= 10
Rank 26: sees nor= 10
Rank 28: sees nor= 10
Rank 29: sees nor= 10
Rank 30: sees nor= 10
Rank 31: sees nor= 10
Rank 27: sees nor= 10
Rank 1: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 2: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 4: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 5: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 8: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 9: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 10: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 11: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 12: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 13: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 14: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 16: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 17: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 18: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 20: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 21: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 24: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 25: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 26: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 27: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 28: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 29: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 30: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 31: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 3: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 15: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 19: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 6: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 7: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 22: sees nhb= 1 nor= 10 rnfreq= 4936495
Rank 23: sees nhb= 1 nor= 10 rnfreq= 4936495
So for some reason that i can't really wrap my head around the integers get completely changed by Bcast and I can't understand why. This prompts the error:
Error in set_hbc_parms [hbc_parms.c] (error no=1):
Parameters are not global
Program aborted
Which means that when he subsequently passes the numbers around in this other function, using MPI_Bcast as well, (i won't post the code of this as to not make the question super long but if you request i can provide) he can see that the numbers are actually different from the nonsensical gibberish he broadcasted earlier.
What is wrong?
Upvotes: 1
Views: 34