kenan
kenan

Reputation: 103

MPI_Comm_size Segmentation fault

Mhm,Hello,everyone.I get these errors when running parallel program wiht MPI and OpenMP in Linux,

 [node65:03788] *** Process received signal ***
 [node65:03788] Signal: Segmentation fault (11)
 [node65:03788] Signal code: Address not mapped (1)
 [node65:03788] Failing at address: 0x44000098
 [node65:03788] [ 0] /lib64/libpthread.so.0 [0x2b663e446c00]
 [node65:03788] [ 1] /public/share/mpi/openmpi-   1.4.5//lib/libmpi.so.0(MPI_Comm_size+0x60) [0x2b663d694360]
 [node65:03788] [ 2] fdtd_3D_xyzPML_MPI_OpenMP(main+0xaa) [0x42479a]
 [node65:03788] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b663e56f184]
 [node65:03788] [ 4] fdtd_3D_xyzPML_MPI_OpenMP(_ZNSt8ios_base4InitD1Ev+0x39) [0x405d79]
 [node65:03788] *** End of error message ***
 -----------------------------------------------------------------------------
 mpirun noticed that process rank 2 with PID 3787 on node node65 exited on signal 11 (Segmentation fault).
 -----------------------------------------------------------------------------

After I analysis the core files,I get following message:

[Thread debugging using libthread_db enabled]
[New Thread 47310344057648 (LWP 26962)]
[New Thread 1075841344 (LWP 26966)]
[New Thread 1077942592 (LWP 26967)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47310344057648 (LWP 26962)]
0x00002b074afb3360 in PMPI_Comm_size () from /public/share/mpi/openmpi-1.4.5//lib/libmpi.so.0

what causes these? Thanks for your help

the code(test.cpp) is as follows,and you can have a try:

#include <stdio.h> 
#include <stdlib.h>
#include <omp.h>
#include "mpi.h"

int main(int argc, char* argv[])
{
int nprocs = 1; //the number of processes
int myrank = 0; 
int provide;

MPI_Init_thread(&argc,&argv,MPI_THREAD_FUNNELED,&provide);
if (MPI_THREAD_FUNNELED != provide)  
{  
    printf ("%d != required %d", MPI_THREAD_FUNNELED, provide);  
    return 0;  
}

MPI_Comm_size(MPI_COMM_WORLD,&nprocs); 
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);  

int num_threads = 1;      //Openmp
omp_set_dynamic(1);
num_threads = 16;
omp_set_num_threads(num_threads);

#pragma omp parallel  
{  
    printf ("%d omp thread from %d mpi process\n", omp_get_thread_num(), myrank);  

}  
MPI_Finalize();

}

Upvotes: 3

Views: 1952

Answers (1)

Cimbali
Cimbali

Reputation: 11395

Well, this is probably not much, or even a bit of a lame answer, but I had this problem when mixing up different MPI installations (an OpenMPI and a MVAPICH2 to be precise).

Here are a few things to check

  • against what version of MPI you linked
ldd <application> | grep -i mpi
    libmpi.so.1 => /usr/lib64/mpi/gcc/openmpi/lib64/libmpi.so.1 (0x00007f90c03cc000)
  • what version of MPI is dynamically loaded
echo $LD_LIBRARY_PATH | tr : "\n" | grep -i mpi
/usr/lib64/mpi/gcc/openmpi/lib64
  • whether you override this dynamic loading (this variable should be empty, unless you know what you're doing)
echo $LD_PRELOAD 

If that's all OK, you need to check that each library you linked to and that relies on MPI was also linked with the same version. If no other library is linked to MPI, nothing should appear.

ldd <application> | sed "s/^\s*\(.*=> \)\?//;s/ (0x[0-9a-fA-F]*)$//" | xargs -L 1 ldd | grep -i mpi

If something suspect does show up, say libmpich.so.3 => /usr/lib64/mpi/gcc/MVAPICH2/1.8.1/lib/libmpich.so.3 for example, you should remove the -L 1 and replace grep with something to visualize (nothing ? or less, or vim - ...), then search for that suspect line.

Upvotes: 4

Related Questions