Reputation: 3
Implementing a parallel version of Game of Life using MPI, getting a segmentation fault (signal 11). New to MPI, and couldn't really get valgrind to tell me where exactly the error exists. Simplified my code and found that there exists a problem in the bold snippet.
Edit: marked the block of code where the problem exists
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
if(argc!=5)
printf("Incorrect number of arguments.\n");
else{
// program logic here
int m, n, sum, pid, nprocs;
char outfilename[16];
FILE *outfile;
FILE *infile;
int r=atoi(argv[3]);
int c=atoi(argv[4]);
int gens=atoi(argv[2]);
int **old, **new, *new1d, *old1d;
int i,j;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&pid); //initializing MPI here
//prevented segmentation error by using atoi
//domain decomposition start
// problem arisis here
int seg=c/nprocs; //divide by width
int ghost=seg+1;
int row=r+1;
int tsize=ghost*row; //ghost cells
old1d = malloc(tsize*sizeof(int));
new1d = malloc(tsize*sizeof(int));
old = malloc(r*sizeof(int*));
new = malloc(r*sizeof(int*));
for(i=0; i<ghost; i++){
old[i] = &old1d[i*row];
new[i] = &new1d[i*row];
}
// problem ends
if(pid==0){
MPI_Send(&old[0][seg], c, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Recv(&old[0][ghost],c, MPI_INT, 1, 1, MPI_COMM_WORLD, &status);
MPI_Send(&old[0][1], c, MPI_INT, 1, 2, MPI_COMM_WORLD);
MPI_Recv(&old[0][0], c, MPI_INT, 1, 3, MPI_COMM_WORLD, &status);
}
else{
MPI_Recv(&old[0][0], c, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
MPI_Send(&old[0][1], c, MPI_INT, 0, 1, MPI_COMM_WORLD);
MPI_Recv(&old[0][ghost],c, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);
MPI_Send(&old[0][seg], c, MPI_INT, 0, 3, MPI_COMM_WORLD);
}
infile=fopen(argv[1],"r");
if(infile==NULL){
printf("Could not locate file.\n");
exit(1);
}
while(fscanf(infile,"%d %d",&m, &n)!=EOF){
old[m][n]=1;
}
fclose(infile);
//repeat for number of generations
for(n=0; n<gens; n++){
for(i=1; i<=r; i++){
for(j=1; j<=c; j++){
sum = old[i-1][j-1] + old[i-1][j] + old[i-1][j+1]
+ old[i][j-1] + old[i][j+1]
+ old[i+1][j-1] + old[i+1][j] + old[i+1][j+1];
if(sum==2 || sum==3)
new[i][j]=1;
else
new[i][j]=0;
}
}
//copying old state into new state
for(i=1; i<=r; i++){
for(j=1; j<=c; j++){
old[i][j] = new[i][j];
}
}
}
//create new output file
sprintf(outfilename,"output_%d",pid);
outfile=fopen(outfilename,"w");
for(i=1; i<=r; i++){
for(j=1; j<=c; j++){
if(new[i][j]==1){
fprintf(outfile,"%d\t%d\n",i ,j);
printf("%d %d",i,j);
}
}
}
fclose(outfile);
MPI_Finalize();
}
return 0;
}
Edit: The input file life.data.1 has X Y co-ordinates that denote live cells.
Upvotes: 0
Views: 2115
Reputation: 341
As wildplasser hinted at in this comment, the crash is stemming from the loop you have here (which is past where you indicated the problem is stemming from):
for(i=1; i<=r; i++){
for(j=1; j<=c; j++){
sum = old[i-1][j-1] + old[i-1][j] + old[i-1][j+1]
+ old[i][j-1] + old[i][j+1] //This line causes a segfault
+ old[i+1][j-1] + old[i+1][j] + old[i+1][j+1]; //This line causes a segfault
if(sum==2 || sum==3)
new[i][j]=1; //This line causes a segfault
else
new[i][j]=0; //This line causes a segfault
}
}
All of the lines indicated by a comment cause a segfault because of the index being accessed. Your loop gives i
and j
values 1
through r
and 1
through c
inclusive, respectively. However, it is only valid to access elements 0
through r-1
in the old
array and elements 0
through r-1
in the new
array. To fix this, we can change the outer loop to
for(i = 1; i < r-1; i++){
//...code here...
}
Notably, this will not set every value in new. It's possible you meant to declare old
to be size row
, which is r+1
. In that case, you could have the continuation condition be i < r
in this loop. I'm not sure that's the case, though. This handles the errors arising from accessing elements in old
.
However, there are still issues arising from the use of old
and old1d
. old
has r
(250) elements in it, but the only ghost
(63) entries are intialized in this case. old1d
never has any of the non-live values initialized, and the non-initialized values are sent with MPI_Send
and MPI_Recv
. You need to pick how large the old
array is and make sure to initialize all of the values in it, and also pick how large the old1d
array is and make sure all of its values are initialized. The same goes for new
and new1d
.
By changing the loops so that old
and new
are always accessed between indices 0
and ghost-1
inclusive, and changing all of the loops so that each element of old
and new
(something like old[i]
and new[i]
) are always accessed between 0
and r-1
inclusive, the program does not crash. This is done by keeping i < ghost
(or < ghost - 1
in the main loop) and keeping j < r
(or, similarly, < r -1
) for each loop. However, this almost certainly does not give the behavior you want. It looks like your current version of the loops are intended for a serial program, and ignore the parallelism you've tried to introduce by splitting the columns up into seg
-sized pieces. The loops need to be completely re-designed so that their accesses use the parallelism in the program, and processes communicate so that adjacent cells across different processors have the appropriate communication. This requires a significant overhaul of the program.
There are two more issues that I want to point out:
The program hangs if you run it on more than 2 processes because the sends and receives are hard-coded to only communicate between process 0 and process 1. You need to make sure that processes 2 and above either don't try to send/receive, or are actually communicating when they send/receive.
When you compute seg = c/nprocs
, the remainder of this division is being ignored. There are 250 columns, but seg becomes 250/4 = 62 (by integer division), and 62*4 = 248, which is less than the full number of columns. You need to make sure to deal with a non-clean division of columns into processes.
Upvotes: 1