Reputation: 23
I am trying to write a simple method to read a file in parallel where each process will read a number of ints from a file in order to split the data to each process, but I get a segmentation fault and I cannot understand why or how to fix it. Here is the code I wrote:
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define NUM_INTS 5
int main (int argc, char** argv) {
MPI_Init(&argc, &argv);
int i;
int rank,processes,name_len;
const int root=0;
int *buf;
char *filename = "file.txt";
MPI_File fh;
MPI_Status status;
MPI_Offset offset;
char processor_name[MPI_MAX_PROCESSOR_NAME];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &processes);
MPI_Get_processor_name(processor_name, &name_len);
MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
buf = malloc(NUM_INTS * sizeof(int));
MPI_File_set_view(fh, 0, MPI_INT, MPI_INT, (char *)NULL, MPI_INFO_NULL);
offset = rank * NUM_INTS;
MPI_File_read_at(fh, offset, buf, NUM_INTS, MPI_INT, &status);
MPI_Barrier(MPI_COMM_WORLD);
MPI_File_close(&fh);
for (i=0;i<NUM_INTS;i++)
printf("rank %d data[%d] = %d\n", rank, i, buf[i]);
free(buf);
MPI_Finalize();
return 0;
}
The file contains 10 ints that I have tried to split it over 2 processes. I think the problem is in MPI_File_read_at because all prints work up to that line
Thanks in advance
Upvotes: 2
Views: 572
Reputation: 5223
Why are you passing "null" to your datatype representation? (in fact, why are you setting the file view at all?)
If you had followed @Colin Cassidy's advice you'd have your back trace pointing directly to the problem: it's not MPI_File_read_at, it's MPI_File_set_view.
Either delete that line, or change (char *)NULL to "native"
Also, you should check your return values, but that would not help you here. See my answer to this question: How to use and interpret MPI-IO Error codes?
MPICH (or rather ROMIO) should not segfault on your garbage input . I've got a patch for this under review. It has the funny side effect of making your call to MPI_File_set_view return an error, which you ignore, and then the rest of your code behaving as you wanted it to.
Upvotes: 3
Reputation: 412
Not being totally aware of the MPI functionality, I'm not 100% sure where your problem is either, however here are a few general debugging tips you can use to try and track down the issue.
1) Use a debugger. GDB or similar is your friend here, you should be able to use it to step through your program line by line, watching all the variables as you go, and this should help you track down where exactly it generates the segmentation fault, and you should know the values of the variables at the time. This should greatly help track down what the issue is.
2) If you have a core dump you can again use gdb to perform post-mortem debugging, and get a stack trace from the point of crash, this may shed some light on the problem.
3) Failing that you can always to with the "printf()" debugging method, debug/dump everything as you calculate it, this might help, but I've found in the past that this can cause the memory locations of things to change and sometimes remove the crash... Note that this does not fix your problem, you have simply re-arranged the memory so that it does not crash under the current circumstances.
Upvotes: 0