Networkguy
Networkguy

Reputation: 51

Ansys MPI_Init_thread: multiple pkey found / partition key table / MPI_IB_PKEY

there are problems with ansys. When I start it, it complains about some partitions. We are using slurm. Does it complain about slurm partitions, in which the jobs run? But RDMA sounds more a hard drive partition. I am a bit confused what the cause of the problem is. Access to the file system or different queues (partitions) in slurm? And how to fix it. Does any one encountered this bug before and maybe know a solution?

It is running on a slurm cluster with an NFS /home an NFS /opt (ansys install) and a BeeGFS /work dir (for models etc).

cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:

cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001

cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff

cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff

cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed

cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device

Upvotes: 2

Views: 1377

Answers (2)

Networkguy
Networkguy

Reputation: 51

For a tcsh shell:

setenv MPI_IB_PKEY "0xffff"

Forces the application to use the "broadcast" "VLAN". I am not sure why there are more than one partitions to choose from.

For a bash shell:

export MPI_IB_PKEY="0xffff"

Upvotes: 2

Bernd Schubert
Bernd Schubert

Reputation: 1

cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY

cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed

-> This is infiniband/rmda, very likely totally unrelated to your file systems.

Upvotes: 0

Related Questions