Reputation: 53
I'm using MPI_Wtime() to measure the speed of a parallel application.
Running the application on 4 cores completes in 0.000061 (takes around 30 seconds)
Running on 50 cores, 0.000308. (instantaneous)
Multiplying the workload 10x, still on 50 cores, the time is 0.000752. (around a 2 minutes irl)
int main(int argc, char* argv[]) {
ofstream file;
file.open("primes.txt");
file.close();
MPI_Init(&argc, &argv);
MPI_Status status;
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
t1 = MPI_Wtime();
int size;
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0)
Parent parent(size);
else
Child child(size, rank);
if (rank == 0) {
t2 = MPI_Wtime();
}
MPI_Finalize();
if (rank == 0)
printf("Runtime = %f\n", t2 - t1);
}
Parent contains a loop to manage children.
These numbers do not make any sense. What am I doing wrong?
MPI_Wtick() is 1e-9
Upvotes: 0
Views: 266
Reputation: 53
Thanks to @Giles Gouaillardet and @Victor Eijkhout for answering.
After moving t1 and t2 to be local and adding MPI_Barrier
before each recording of the time, I was able to get an answer that made sense.
Running the code on 4 cores gave a result of 20.277840, which sounds correct.
Before, this same test gave a result of 0.000061, which does not make any sense at all.
Thank you.
Upvotes: 1