cpp_noname
cpp_noname

Reputation: 2071

Strange Segmentation Fault from MPI

I wrote a simple MPI program for the purpose of practicing with MPI user-defined datatype functions. The following is a version that throws out a segfault.

    #include <mpi.h>
    #include <iostream>

    using namespace std;

    int main( int argc , char ** argv )
    {
        int rank;

        MPI_Datatype newtype;
        MPI_Datatype newertype;

        MPI_Init(&argc,&argv);

        MPI_Comm_rank(MPI_COMM_WORLD,&rank);

        MPI_Type_contiguous(2,MPI_INT,&newtype);
        MPI_Type_commit(&newtype);
        MPI_Type_vector(3,2,3,newtype,&newertype);
        MPI_Type_commit(&newertype);    



        int * buffer = new int[16];

        for( int i=0 ; i<16 ; i++ )
        {
            buffer[i] = 0;
        }

        if(rank==0)
        {
            for( int i=0 ; i<16 ; i++ )
            {
                buffer[i] = 9;
            }

            MPI_Send(buffer,3,newertype,1,0,MPI_COMM_WORLD);        

        }else if(rank==1)
        {
            MPI_Recv(buffer,3,newertype,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

            for( int i=0 ; i<16 ; i++ )
            {
                cout << buffer[i] << " ";
            }

            cout << endl;

        }

        MPI_Type_free(&newertype);
        MPI_Type_free(&newtype);

        MPI_Finalize();

        return 0;
    }

However, when the array declaration is moved before MPI_Init, everything works fine.

#include <mpi.h>
#include <iostream>

using namespace std;

int main( int argc , char ** argv )
{
    int rank;

    **int * buffer = new int[16];

    for( int i=0 ; i<16 ; i++ )
    {
            buffer[i] = 0;
    }**

    MPI_Datatype newtype;
    MPI_Datatype newertype;

    MPI_Init(&argc,&argv);

    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

    MPI_Type_contiguous(2,MPI_INT,&newtype);
    MPI_Type_commit(&newtype);
    MPI_Type_vector(3,2,3,newtype,&newertype);
    MPI_Type_commit(&newertype);    

    if(rank==0)
    {
        for( int i=0 ; i<16 ; i++ )
        {
            buffer[i] = 9;
        }

        MPI_Send(buffer,3,newertype,1,0,MPI_COMM_WORLD);        

    }else if(rank==1)
    {
        MPI_Recv(buffer,3,newertype,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

        for( int i=0 ; i<16 ; i++ )
        {
            cout << buffer[i] << " ";
        }

        cout << endl;

    }

    MPI_Type_free(&newertype);
    MPI_Type_free(&newtype);

    MPI_Finalize();

    return 0;
}

Can anyone explain what is wrong with declaring an array after the MPI_Init call?

For your information, below is the error message

9 9 9 9 0 0 9 9 9 9 0 0 9 9 9 9 
[linuxscc003:10019] *** Process received signal ***
[linuxscc003:10019] Signal: Segmentation fault (11)
[linuxscc003:10019] Signal code: Address not mapped (1)
[linuxscc003:10019] Failing at address: 0x7fa00d0b36c8 
[linuxscc003:10019] [ 0] /lib64/libpthread.so.0() [0x3abf80f500]
[linuxscc003:10019] [ 1] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x299) [0x7f980ce46509]
[linuxscc003:10019] [ 2] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0xe7b2b) [0x7f980ce46b2b]                            
[linuxscc003:10019] [ 3] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0xf0a60) [0x7f980ce4fa60]                            
[linuxscc003:10019] [ 4] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(mca_base_param_finalize+0x41) [0x7f980ce4f731]        
[linuxscc003:10019] [ 5] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(opal_finalize_util+0x1b) [0x7f980ce3f53b]             
[linuxscc003:10019] [ 6] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0x4ce35) [0x7f980cdabe35]                            
[linuxscc003:10019] [ 7] type_contiguous(main+0x1aa) [0x408f2e]                                                                 
[linuxscc003:10019] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3abec1ecdd]                                                
[linuxscc003:10019] [ 9] type_contiguous() [0x408cc9]                                                                           
[linuxscc003:10019] *** End of error message ***                                                                                
--------------------------------------------------------------------------                                                      
mpiexec noticed that process rank 1 with PID 10019 on node linuxscc003 exited on signal 11 (Segmentation fault).                
--------------------------------------------------------------------------                                                      
Failure executing command /opt/MPI/openmpi-1.5.3/linux/gcc/bin/mpiexec -x  LD_LIBRARY_PATH -x  PATH -x  OMP_NUM_THREADS -x  MPI_NAME --hostfile /tmp/hostfile-9252 -np 2 type_contiguous                     

Upvotes: 1

Views: 3231

Answers (1)

Hristo Iliev
Hristo Iliev

Reputation: 74495

newertype has 3 segments consisting of 2 elements of newtype with a stride of 3. You are sending 3 elements of that type. It means that the span in memory from the first element being accessed during send or receive operation to the last one is 3*3*3 - 1 (3 elements each having 3 segments of 3 elements, minus 1 because you only take 2 elements out of 3 for the very last segment) or 26 elements of type newtype. Each newtype is two consecutive MPI_INT elements. Your send or receive buffers should be at least 52 integers but you only allocate 16 so the MPI_Recv in rank 1 is writing past the end of the allocated buffer, possibly overwriting heap control structures. Moving the allocation before the call to MPI_Init changes the order of those structures in memory and your code is now overwriting something different but not critical. The code is still incorrect and you are just lucky that it doesn't segfault. Use larger buffers (at least 52 elements).

Upvotes: 4

Related Questions