Reputation: 21
I am getting an error with MPI_Bcast (I think it is an old one) I am not sure why this is happening. The error is as follows:
An error occurred in MPI_Bcast
on communicator MPI_COMM_WORLD
MPI_ERR_TRUNCATE: message truncated
MPI_ERRORS_ARE_FATAL: your MPI job will now abort
The code where it happens is:
for (int i = 0; i < nbProcs; i++){
for (int j = firstLocalGrainRegion; j < lastLocalGrainRegion; j++){
GrainRegion * grainRegion = microstructure->getGrainRegionAt(j);
int grainSize = grainRegion->getBoxSize(nb);
double * newValues;
if (myId == i)
newValues = grainRegion->getNewValues();
else
newValues = new double[grainSize];
MPI_Bcast(newValues, grainSize, MPI_DOUBLE, i, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
if (myId != i)
grainRegion->setNewValues(newValues);
}
}
Upvotes: 1
Views: 4556
Reputation: 74475
There are two possible reasons for the error.
The first one is that you have a pending previous MPI_Bcast
, started somewhere before the outer loop, which did not complete, e.g. in a manner similar to the one in this question.
The second one is a possible buffer size mismatch because of grainRegion->getBoxSize(nb)
returning different values in different processes. You can examine the code with a parallel debugger or just put a print statement before the broadcast, for example:
int grainSize = grainRegion->getBoxSize(nb);
printf("i=%d j=%d rank=%02d grainSize=%d\n", i, j, myId, grainSize);
With this particular output format, you should be able to simply run the output through sort
and then quickly find mismatched values. Because of the barrier, which is always synchornising (the broadcast might not necessary be so), it is hardly possible for the different calls to MPI_Bcast
to interfere with one another as in the first possible case.
If it happens so that your data structure is distributed and indeed the correct value of grainSize
is only availalbe at the broadcast root process, then you should first notify the other ranks of the correct size. The simplest (but not the most efficient) solution would be to broadcast grainSize
. A better solution would be to first perform an MPI_Allgather
with the number of grain regions at each process (only if necessary), then perform an MPI_Allgatherv
with the sizes of each region.
Upvotes: 2