Christopher23
Christopher23

Reputation: 126

BLACS context value and multiple MPI communicators

I am trying to make some tests with BLACS/Scalapack (C interface, Intel MKL version) making use multiple MPI communicators: particularly, what I am trying to obtain is a set of BLACS contexts/grids corresponding (one-to-one) to a set of (disjoint) MPI communicators. While I have no problems in working with a single "global" communicator (MPI_COMM_WORLD) I have some difficulties with the multiple communicator case. I hope that you could give me some suggestions for the following problem.

I am a bit confused about the BLACS context variable value update after a call to Cblacs_gridinit: let's suppose to start with a "global" context corresponding to the MPI_COMM_WORLD communicator. I can obtain the corresponding BLACS context with the call:

MPI_Comm globalCommunicator(MPI_COMM_WORLD);

MKL_INT globalContext(Csys2blacs_handle(globalCommunicator));

and create a grid on it with

Cblacs_gridinit(&globalContext,
                &c_blacsGridOrdering,
                MKL_INT i_nTaskRow,
                MKL_INT i_nTaskCol);

with, for example, char c_blacsGridOrdering('R').

The globalContext value in this case is 0.

At some point in my code, which is making use of 6 MPI tasks, an MPI communicator (localCommunicator) corresponding to the group of 4 tasks with id [0;3] is being created: at this point I would like to create a new BLACS context (localContext) for this "local" communicator and a local grid on it. I can do that with the code

MKL_INT localContext(Csys2blacs_handle(localCommunicator));

Cblacs_gridinit(&localContext,
                &c_blacsGridOrdering,
                MKL_INT i_nTaskRowLocal,
                MKL_INT i_nTaskColLocal);

where the above function call sequence is performed only by the tasks included in the local communicator.

The localContext value, after the call to Csys2blacs_handle, is equal to 1 (for each task of the local communicator) but it is modified and set to 0 by the subsequent call to Cblacs_gridinit.

Obviously this causes some problems in the rest of my code since, for example, if I try to retrieve the MPI communicator corresponding to localContext with a call to Cblacs2sys_handle I get a six task communicator corresponding to the initial globalCommunicator.

Most likely I am making some very stupid error (missing call to some BLACS functions??) or it is not clear to me the interplay between multiple MPI communicators and BLACS context/grids but I can not find what is wrong with my code.

Do you have some suggestions concerning the above problem? Many thanks for your help!

UPDATE 1

I have an update for my question which could be useful to find a solution or at least an explanation for the observed problem: the described problem is no more present if I initialize the first grid (the one making use of the global context) on a task grid whose size is such that all available MPI tasks are included, for example with MKL_INT i_nTaskRow(1) and MKL_INT i_nTaskCol(MPI_COMM_WORLD size). Is this behavior expected from BLACS? Many thanks again for your support!

Upvotes: 4

Views: 571

Answers (1)

Matthias Beaupère
Matthias Beaupère

Reputation: 1847

A few interesting links about this matter:

In case it helps anyone, for me the problem was that I used Csys2blacs_handle() (the C wrapper) and then blacs_gridinit_() (the Fortran interface). It seems silly but you should check twice that you did not mix them.

A remark for debugging purpose: with the Fortran interface the context integer is similar to a MPI communicator (very large integer), but with the C wrapper it is a small integer (0,1..).

The solution was to replace blacs_gridinit_() with Cblacs_gridinit().

Upvotes: 1

Related Questions