Reputation: 103
Mhm,Hello,everyone.I get these errors when running parallel program wiht MPI and OpenMP in Linux,
[node65:03788] *** Process received signal ***
[node65:03788] Signal: Segmentation fault (11)
[node65:03788] Signal code: Address not mapped (1)
[node65:03788] Failing at address: 0x44000098
[node65:03788] [ 0] /lib64/libpthread.so.0 [0x2b663e446c00]
[node65:03788] [ 1] /public/share/mpi/openmpi- 1.4.5//lib/libmpi.so.0(MPI_Comm_size+0x60) [0x2b663d694360]
[node65:03788] [ 2] fdtd_3D_xyzPML_MPI_OpenMP(main+0xaa) [0x42479a]
[node65:03788] [ 3] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b663e56f184]
[node65:03788] [ 4] fdtd_3D_xyzPML_MPI_OpenMP(_ZNSt8ios_base4InitD1Ev+0x39) [0x405d79]
[node65:03788] *** End of error message ***
-----------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 3787 on node node65 exited on signal 11 (Segmentation fault).
-----------------------------------------------------------------------------
After I analysis the core files,I get following message:
[Thread debugging using libthread_db enabled]
[New Thread 47310344057648 (LWP 26962)]
[New Thread 1075841344 (LWP 26966)]
[New Thread 1077942592 (LWP 26967)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47310344057648 (LWP 26962)]
0x00002b074afb3360 in PMPI_Comm_size () from /public/share/mpi/openmpi-1.4.5//lib/libmpi.so.0
what causes these? Thanks for your help
the code(test.cpp) is as follows,and you can have a try:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include "mpi.h"
int main(int argc, char* argv[])
{
int nprocs = 1; //the number of processes
int myrank = 0;
int provide;
MPI_Init_thread(&argc,&argv,MPI_THREAD_FUNNELED,&provide);
if (MPI_THREAD_FUNNELED != provide)
{
printf ("%d != required %d", MPI_THREAD_FUNNELED, provide);
return 0;
}
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
int num_threads = 1; //Openmp
omp_set_dynamic(1);
num_threads = 16;
omp_set_num_threads(num_threads);
#pragma omp parallel
{
printf ("%d omp thread from %d mpi process\n", omp_get_thread_num(), myrank);
}
MPI_Finalize();
}
Upvotes: 3
Views: 1952
Reputation: 11395
Well, this is probably not much, or even a bit of a lame answer, but I had this problem when mixing up different MPI installations (an OpenMPI and a MVAPICH2 to be precise).
Here are a few things to check
ldd <application> | grep -i mpi
libmpi.so.1 => /usr/lib64/mpi/gcc/openmpi/lib64/libmpi.so.1 (0x00007f90c03cc000)
echo $LD_LIBRARY_PATH | tr : "\n" | grep -i mpi
/usr/lib64/mpi/gcc/openmpi/lib64
echo $LD_PRELOAD
If that's all OK, you need to check that each library you linked to and that relies on MPI was also linked with the same version. If no other library is linked to MPI, nothing should appear.
ldd <application> | sed "s/^\s*\(.*=> \)\?//;s/ (0x[0-9a-fA-F]*)$//" | xargs -L 1 ldd | grep -i mpi
If something suspect does show up, say libmpich.so.3 => /usr/lib64/mpi/gcc/MVAPICH2/1.8.1/lib/libmpich.so.3
for example, you should remove the -L 1 and replace grep with something to visualize (nothing ? or less
, or vim -
...), then search for that suspect line.
Upvotes: 4