user11594134
user11594134

Reputation: 129

MPI dynamically allocate tasks

I have a C++ MPI program that runs on Windows HPC cluster (12 nodes, 24 cores per node).

There is one problem. Each task can have drastically different execution time and there is no way that I can tell that in advance. Equally distributing the task will results a lot of processes waiting idle. This wastes a lot of computer resources and make the total execution time longer.

I am thinking of one solution that might work.

As far as I understand, this scheme need a universal counter across nodes/process (to avoid different MPI process execute the same parcel) and changing it need some lock/sync mechanism. It certainly has its overhead but with proper tuning, I think it can help to improve the performance.

I am not quite familiar with MPI and have some implementation issues. I can think of two ways to implement this universal counter

  1. Using MPI I/O technique, write this counter in file, when a parcel is took, increase this counter (will certainly need file lock mechanism)
  2. Using MPI one side communication/shared memory. Put this counter in the shared memory and increase it when a parcel is taken. (will certainly need a sync mechanism)

Unfortunately, I am not familiar with either technique and want to explore the possibility, implementation, or possible drawbacks of the two above methods. A sample code would be greatly appreciated.

If you have other ways to solve the problem or suggestions, that will also be great. Thanks.

Follow-ups:

Thanks for all the useful suggestions. I am implemented a test program following the scheme of using process 0 as the task distributor.

#include <iostream>
#include <mpi.h>

using namespace std;

void doTask(int rank, int i){
    cout<<rank<<" got task "<<i<<endl;
}

int main ()
{
    int numTasks = 5000;
    int parcelSize = 100;

    int numParcels = (numTasks/parcelSize) + (numTasks%parcelSize==0?0:1);

    //cout<<numParcels<<endl;

    MPI_Init(NULL, NULL);

    int rank, nproc;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);

    MPI_Status status;
    MPI_Request request;

    int ready = 0;
    int i = 0;
    int maxParcelNow = 0;

    if(rank == 0){
        for(i = 0; i <numParcels; i++){
            MPI_Recv(&ready, 1, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
            //cout<<i<<"Yes"<<endl;
            MPI_Send(&i, 1, MPI_INT, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
            //cout<<i<<"No"<<endl;
        }
        maxParcelNow = i;
        cout<<maxParcelNow<<" "<<numParcels<<endl;
    }else{
        int counter = 0;
        while(true){
            if(maxParcelNow == numParcels) {
                cout<<"Yes exiting"<<endl;
                break;
            }
            //if(maxParcelNow == numParcels - 1) break;
            ready = 1;
            MPI_Send(&ready, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
            //cout<<rank<<"send"<<endl;
            MPI_Recv(&i, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
            //cout<<rank<<"recv"<<endl;
            doTask(rank, i);
        }
    }

    MPI_Bcast(&maxParcelNow, 1, MPI_INT, 0, MPI_COMM_WORLD);    

    MPI_Finalize();
    return 0;
}

It does not work and it never stops. Any suggestions on how to make it work? Does this code reflect the idea right or am I missing something? Thanks

Upvotes: 0

Views: 2106

Answers (1)

AlexG
AlexG

Reputation: 1103

[Converting my comments into an answer...]

Given n processes, you can have your first process p0 dispatch tasks for the other n - 1 processes. First, it will do point-to-point communication to the other n - 1 processes so that everyone has work to do, and then it will block on a Recv. When any given process completes, say p3, it will send its result back to p0. At this point, p0 will send another message to p3 with one of two things:

1) Another task

or

2) Some kind of termination signal if there are no tasks remaining. (Using the 'tag' of the message is one easy way.)

Obviously, p0 will loop over that logic until there is no task left, in which case it will call MPI_Finalize too.

Unlike what you thought in your comments, this isn't round-robin. It first gives a job to every process, or worker, and then gives back another job whenever one completes...

Upvotes: 2

Related Questions