Reputation: 456
I want to simply scatter an 1D big array to all processes and I observed that when array size larger than about 2.5*115 million(the 1D array is actually a matrix with 2.5 million rows and 115 column, every number is double-precision), some processes will report segmentation fault. 1*115 million or 2*115 million is OK!!!!
The error message was like:
Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1123f800)
The code snippet is:
MPI_Init(&argc, &argv);
int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
double *pointData = NULL;
double *fracPointData = NULL;
long pointDataEleTotal = numRows * numColumn;
long fracPointDataEleTotal = pointDataEleTotal / size;
if(rank == 0){
pointData = (double *)malloc(sizeof(double) * pointDataEleTotal);
initialize pointData
fracPointData = (double *)malloc(sizeof(double) * fracPointDataEleTotal);
}
if(rank != 0){
fracPointData = (double *)malloc(sizeof(double) * fracPointDataEleTotal);
}
MPI_Scatter(pointData, fracPointDataEleTotal, MPI_DOUBLE, fracPointData, fracPointDataEleTotal, MPI_DOUBLE, 0, MPI_COMM_WORLD);
I have tried and observed:
ulimit -s
and the
result is unlimited
.Segmentation fault: address not mapped to object at address (nil)
; When using 8 processes, the 3rd and 5rd process will cause Segmentation fault: address not mapped to object at address 0x1123f800
Very strange problem that confused me a few days, hope to get some useful advice from you guys, much appreciation.
Upvotes: 0
Views: 146
Reputation: 1630
It is quite possible that your MPI implementation (which should have been explicitly mentioned in your question so that we can actually have a look) internally converts your data to bytes and then operates on bytes. A plain C integer can index up to 2,147,483,648 bytes. That corresponds to 268,435,456 8-byte double elements. Or, in your matrix dimensions, approximately 2.33 × 115 million, coinciding with the threshold where you begin to experience problems.
To give an explicit example, Open MPI was internally serializing datatypes to bytes for MPI_Sendrecv
and then recursively calling MPI_Sendrecv
on the same data, but with accordingly larger (but still int
) count
argument. This was causing overflows. It can well be that your MPI is having a similar issue with MPI_Scatter
.
Upvotes: 0
Reputation: 5794
I suspect that you are hitting the 2.14 billion limit on ints, even though you state that you are a factor of 10 off from that. Your quantity fracPointDataEleTotal
is a long
but MPI routines only take int
for the number of elements sent. Test if your number fits in an int, otherwise you're out of luck.
....unless you install the latest mpich which support MPI-4, which has routines MPI_Send_c
and (for you) MPI_Scatter_c
that take an MPI_Count
parameter for the count, and which is much larger.
(Also, just to be sure, test that your malloc actually succeeds and you're not running out of memory!)
Upvotes: 1