Reputation: 173
Im trying to use MPI to multiply two nxn matrices. The second matrix (bb) is broadcasted to all "slaves" and then it is sent a row from the first matrix (aa) to compute the product. It then sends back the answer to the master process and is stored in the product matrix cc. For some reason I'm getting the error:
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
I believe that the master process is recieving all the messages sent by slave process and vice-versa so I'm not sure what is going on here... any ideas?
Main:
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/times.h>
#define min(x, y) ((x)<(y)?(x):(y))
#define MASTER 0
double* gen_matrix(int n, int m);
int mmult(double *c, double *a, int aRows, int aCols, double *b, int bRows, int bCols);
int main(int argc, char* argv[]) {
int nrows, ncols;
double *aa; /* the A matrix */
double *bb; /* the B matrix */
double *cc1; /* A x B computed */
double *buffer; /* Row to send to slave for processing */
double *ans; /* Computed answer for master */
int myid, numprocs;
int i, j, numsent, sender;
int row, anstype;
double starttime, endtime;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &myid);
if (argc > 1) {
nrows = atoi(argv[1]);
ncols = nrows;
if (myid == 0) {
/* Master Code */
aa = gen_matrix(nrows, ncols);
bb = gen_matrix(ncols, nrows);
cc1 = malloc(sizeof(double) * nrows * nrows);
starttime = MPI_Wtime();
buffer = (double*)malloc(sizeof(double) * ncols);
numsent = 0;
MPI_Bcast(bb, ncols*nrows, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); /*broadcast bb to all slaves*/
for (i = 0; i < min(numprocs-1, nrows); i++) { /*for each process or row*/
for (j = 0; j < ncols; j++) { /*for each column*/
buffer[j] = aa[i * ncols + j]; /*get row of aa*/
}
MPI_Send(buffer, ncols, MPI_DOUBLE, i+1, i+1, MPI_COMM_WORLD); /*send row to slave*/
numsent++; /*increment number of rows sent*/
}
ans = (double*)malloc(sizeof(double) * ncols);
for (i = 0; i < nrows; i++) {
MPI_Recv(ans, ncols, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
sender = status.MPI_SOURCE;
anstype = status.MPI_TAG;
for (i = 0; i < ncols; i++){
cc1[(anstype-1) * ncols + i] = ans[i];
}
if (numsent < nrows) {
for (j = 0; j < ncols; j++) {
buffer[j] = aa[numsent*ncols + j];
}
MPI_Send(buffer, ncols, MPI_DOUBLE, sender, numsent+1,
MPI_COMM_WORLD);
numsent++;
} else {
MPI_Send(MPI_BOTTOM, 0, MPI_DOUBLE, sender, 0, MPI_COMM_WORLD);
}
}
endtime = MPI_Wtime();
printf("%f\n",(endtime - starttime));
} else {
/* Slave Code */
buffer = (double*)malloc(sizeof(double) * ncols);
bb = (double*)malloc(sizeof(double) * ncols*nrows);
MPI_Bcast(bb, ncols*nrows, MPI_DOUBLE, MASTER, MPI_COMM_WORLD); /*get bb*/
if (myid <= nrows) {
while(1) {
MPI_Recv(buffer, ncols, MPI_DOUBLE, MASTER, MPI_ANY_TAG, MPI_COMM_WORLD, &status); /*recieve a row of aa*/
if (status.MPI_TAG == 0){
break;
}
row = status.MPI_TAG; /*get row number*/
ans = (double*)malloc(sizeof(double) * ncols);
for (i = 0; i < ncols; i++){
ans[i]=0.0;
}
for (i=0; i<nrows; i++){
for (j = 0; j < ncols; j++) { /*for each column*/
ans[i] += buffer[j] * bb[j * ncols + i];
}
}
MPI_Send(ans, ncols, MPI_DOUBLE, MASTER, row, MPI_COMM_WORLD);
}
}
} /*end slave code*/
} else {
fprintf(stderr, "Usage matrix_times_vector <size>\n");
}
MPI_Finalize();
return 0;
}
Upvotes: 0
Views: 212
Reputation: 9489
This error message typically means that one at least of your MPI processes crashed and the whole MPI job subsequently aborted. It can be caused by any sort of error, but most of the time, it's a segmentation fault caused by an erroneous memory access.
I didn't look closely at the code, so I have no idea if the logic works etc, but what I can tell is that this line has an issue:
MPI_Recv(&ans, nrows, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
Indeed, there are two problems here:
&ans
is a **double
, which is not what you want, I guess you wanted ans
ans
hasn't been allocated so it cannot be used as a receiving bufferTry first to fix this and see what happens.
EDIT: on your new code you allocate ans
like this:
ans = (double*)malloc(sizeof(double) * ncols);
then you initialise it like this:
for (i = 0; i < nrows; i++) {
ans[i]=0.0;
}
And use it like this:
MPI_Send(ans, nrows, MPI_DOUBLE, MASTER, row, MPI_COMM_WORLD);
or
MPI_Recv(ans, nrows, MPI_DOUBLE, MPI_ANY_SOURCE, MPI_ANY_TAG,
MPI_COMM_WORLD, &status);
This isn't coherent: is ans
's size ncols
or nrows
?
And what is your new error message?
Upvotes: 2