Reputation: 423
(the problem is embarrassingly parallel)
Consider an array of 12 cells:
|__|__|__|__|__|__|__|__|__|__|__|__|
and four (4) CPUs.
Naively, I would run 4 parallel jobs and feeding 3 cells to each CPU.
|__|__|__|__|__|__|__|__|__|__|__|__|
=========|========|========|========|
1 CPU 2 CPU 3 CPU 4 CPU
BUT, it appears, that each cell has different evaluation time, some cells are evaluated very quickly, and some are not.
So, instead of wasting "relaxed CPU", I think to feed EACH cell to EACH CPU at time and continue until the entire job is done.
Namely:
at the beginning:
|____|____|____|____|____|____|____|____|____|____|____|____|
1cpu 2cpu 3cpu 4cpu
if, 2cpu finished his job at cell "2", it can jump to the first empty cell "5" and continue working:
|____|done|____|____|____|____|____|____|____|____|____|____|
1cpu 3cpu 4cpu 2cpu
|-------------->
if 1cpu finished, it can take sixth cell:
|done|done|____|____|____|____|____|____|____|____|____|____|
3cpu 4cpu 2cpu 1cpu
|------------------------>
and so on, until the full array is done.
QUESTION:
I do not know a priori which cell is "quick" and which cell is "slow", so I cannot spread cpus according to the load (more cpus to slow, less to quick). How one can implement such algorithm for dynamic evaluation with MPI?
Thanks!!!!!
UPDATE
I use a very simple approach, how to divide the entire job into chunks, with IO-MPI:
given: array[NNN] and nprocs - number of available working units:
for (int i=0;i<NNN/nprocs;++i)
{
do_what_I_need(start+i);
}
MPI_File_write(...);
where "start" corresponds to particular rank number. In simple words, I divide the entire NNN array into fixed size chunk according to the number of available CPU and each CPU performs its chunk, writes the result to (common) output and relaxes.
IS IT POSSIBLE to change the code (Not to completely re-write in terms of Master/Slave paradigm) in such a way, that each CPU will get only ONE iteration (and not NNN/nprocs) and after it completes its job and writes its part to the file, will Continue to the next cell and not to relax.
Thanks!
Upvotes: 2
Views: 347
Reputation: 1151
To answer your updated question:
Under the master/slave (or worker pool if that's how you prefer it to be labelled) model, you will basically need a task scheduler. The master should have information about what work has been done and what still needs to be done. The master will give each process some work to be done, then sit and wait until a process completes (using nonblocking receives and a wait_all). Once a process completes, have it send the data to the master then wait for the master to respond with more work. Continue this until the work is done.
Upvotes: 2
Reputation: 5095
You want to implement a kind of client-server architecture where you have workers asking the server for work whenever they are out of work.
Depending on the size of the chunks and the speed of your communication between workers and server, you may want to adjust the size of the chunks sent to workers.
Upvotes: 2
Reputation: 74485
There is a well known parallel programming pattern, known under many names, some of which are: bag of tasks, master / worker, task farm, work pool, etc. The idea is to have a single master process, which distributes cells to the other processes (workers). Each worker runs an infinite loop in which it waits for a message from the master, computes something and then returns the result. The loop is terminated by having the master send a message with a special tag. The wildcard tag value MPI_ANY_TAG
can be used by the worker to receive messages with different tags.
The master is more complex. It also runs a loop but until all cells have been processed. Initially it sends each worker a cell and then starts a loop. In this loop it receives a message from any worker using the wildcard source value of MPI_ANY_SOURCE
and if there are more cells to be processed, sends one of them to the same worker that have returned the result. Otherwise it sends a message with a tag set to the termination value.
There are many many many readily available implementations of this model on the Internet and even some on Stack Overflow (for example this one). Mind that this scheme requires one additional MPI process that often does very little work. If this is unacceptable, one can run a worker loop in a separate thread.
Upvotes: 4