Reputation: 35
I'm building a montecarlo simulation in C with MPI and I'm running into a strange error with reading in files using a struct. I've replicated the problem in the simple code below. This example code fails in the same way as the much larger simulation. Below is the contents of main.c. The contents of readme.txt is just one small line of text.
#include <mpi.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct test_struct {
char * filename;
} test_struct;
int read(struct test_struct * obj){
FILE * file = fopen(obj->filename, "r");
char buf[512];
if (file == NULL) return -1;
else {
fgets(buf, sizeof(buf), file);
printf("%s\n", buf);
}
fclose(file);
return 0;
}
int main() {
MPI_Init(NULL, NULL);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
struct test_struct obj;
obj.filename = (char *) malloc(256*sizeof(char));
strcpy(obj.filename, "readme.txt");
printf("%s\n", obj.filename);
read(&obj);
free(obj.filename);
return 0;
}
I compile with this simple command mpicc -g main.c
. When I run the executable I get the following error message.
→ ./a.out
[lap-johnson:00190] *** Process received signal ***
[lap-johnson:00190] Signal: Segmentation fault (11)
[lap-johnson:00190] Signal code: Address not mapped (1)
[lap-johnson:00190] Failing at address: 0x7
[lap-johnson:00190] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f2023741730]
[lap-johnson:00190] [ 1] ./a.out(read+0x19)[0x7f20238a41ae]
[lap-johnson:00190] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_singleton.so(+0x2e77)[0x7f20225e2e77]
[lap-johnson:00190] [ 3] /usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x7f20234aa11a]
[lap-johnson:00190] [ 4] /usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x7f202379be62]
[lap-johnson:00190] [ 5] /usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0xa9)[0x7f20237ca1b9]
[lap-johnson:00190] [ 6] ./a.out(+0x1211)[0x7f20238a4211]
[lap-johnson:00190] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f202358409b]
[lap-johnson:00190] [ 8] ./a.out(+0x10da)[0x7f20238a40da]
[lap-johnson:00190] *** End of error message ***
[1] 190 segmentation fault (core dumped) ./a.out
I have tried to use gdb to see what's going on with the error. It says that obj
, the instance of the test_struct
, is at memory address 0x7. I think the program segfaults because it's trying to read this address which is invalid. The gdb output is below.
→ gdb ./a.out
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...done.
(gdb) run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 257]
Program received signal SIGSEGV, Segmentation fault.
0x00000000080011be in read (obj=0x7) at main.c:11
11 FILE * file = fopen(obj->filename, "r");
(gdb) print obj
$1 = (struct test_struct *) 0x7
(gdb) print obj->filename
Cannot access memory at address 0x7
Why would the read
function see the struct at memory address 0x7? I could be doing something wrong (or non-standard) with the string manipulation. But I can't figure out how to fix this issue. Note that this compiles and runs perfectly in gcc (if I remove the MPI stuff of course).
I did hear something about how MPI doesn't like structs with pointers as members. But I think that was in the context of sending and receiving. Any help with this issue is appreciated. I am pretty new to MPI.
I am running Open MPI version 3.1.3 on Debian inside of Windows Subsystem for Linux (4.4.0-19041-Microsoft). I've confirmed that the same issue occurs on my Debian Linux machine using a custom build of Open MPI version 2.1.1.
Upvotes: 2
Views: 129
Reputation: 8395
read()
is a subroutine of the libC
and you should not redefine it. Instead, just rename this function in your code.
Open MPI calls the read()
from the libC
but instead invoked your subroutine, and hence the bizarre stack trace.
Upvotes: 2