Reputation: 126
I am trying to make some tests with BLACS/Scalapack (C interface, Intel MKL version) making use multiple MPI communicators: particularly, what I am trying to obtain is a set of BLACS contexts/grids corresponding (one-to-one) to a set of (disjoint) MPI communicators. While I have no problems in working with a single "global" communicator (MPI_COMM_WORLD
) I have some difficulties with the multiple communicator case. I hope that you could give me some suggestions for the following problem.
I am a bit confused about the BLACS context variable value update after a call to Cblacs_gridinit
: let's suppose to start with a "global" context corresponding to the MPI_COMM_WORLD
communicator. I can obtain the corresponding BLACS context with the call:
MPI_Comm globalCommunicator(MPI_COMM_WORLD);
MKL_INT globalContext(Csys2blacs_handle(globalCommunicator));
and create a grid on it with
Cblacs_gridinit(&globalContext,
&c_blacsGridOrdering,
MKL_INT i_nTaskRow,
MKL_INT i_nTaskCol);
with, for example, char c_blacsGridOrdering('R')
.
The globalContext value in this case is 0.
At some point in my code, which is making use of 6 MPI tasks,
an MPI communicator (localCommunicator
) corresponding to the group of 4 tasks with id [0;3]
is being created: at this point I would like to create a new BLACS context (localContext
) for this "local" communicator and a local grid on it. I can do that with the code
MKL_INT localContext(Csys2blacs_handle(localCommunicator));
Cblacs_gridinit(&localContext,
&c_blacsGridOrdering,
MKL_INT i_nTaskRowLocal,
MKL_INT i_nTaskColLocal);
where the above function call sequence is performed only by the tasks included in the local communicator.
The localContext
value, after the call to Csys2blacs_handle
, is equal to 1 (for each task of the local communicator) but it is modified and set to 0 by the subsequent call to Cblacs_gridinit
.
Obviously this causes some problems in the rest of my code since, for example, if I try to retrieve the MPI communicator corresponding to localContext
with a call to Cblacs2sys_handle
I get a six task communicator corresponding to the initial globalCommunicator
.
Most likely I am making some very stupid error (missing call to some BLACS functions??) or it is not clear to me the interplay between multiple MPI communicators and BLACS context/grids but I can not find what is wrong with my code.
Do you have some suggestions concerning the above problem? Many thanks for your help!
I have an update for my question which could be useful to find a solution or at least an explanation for the observed problem: the described problem is no more present if I initialize the first grid (the one making use of the global context) on a task grid whose size is such that all available MPI tasks are included, for example with MKL_INT i_nTaskRow(1)
and MKL_INT i_nTaskCol(MPI_COMM_WORLD size)
. Is this behavior expected from BLACS? Many thanks again for your support!
Upvotes: 4
Views: 571
Reputation: 1847
A few interesting links about this matter:
In case it helps anyone, for me the problem was that I used Csys2blacs_handle()
(the C wrapper) and then blacs_gridinit_()
(the Fortran interface). It seems silly but you should check twice that you did not mix them.
A remark for debugging purpose: with the Fortran interface the context integer is similar to a MPI communicator (very large integer), but with the C wrapper it is a small integer (0,1..).
The solution was to replace blacs_gridinit_()
with Cblacs_gridinit()
.
Upvotes: 1