Reputation: 51
there are problems with ansys. When I start it, it complains about some partitions. We are using slurm. Does it complain about slurm partitions, in which the jobs run? But RDMA sounds more a hard drive partition. I am a bit confused what the cause of the problem is. Access to the file system or different queues (partitions) in slurm? And how to fix it. Does any one encountered this bug before and maybe know a solution?
It is running on a slurm cluster with an NFS /home an NFS /opt (ansys install) and a BeeGFS /work dir (for models etc).
cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed
cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device
Upvotes: 2
Views: 1377
Reputation: 51
For a tcsh shell:
setenv MPI_IB_PKEY "0xffff"
Forces the application to use the "broadcast" "VLAN". I am not sure why there are more than one partitions to choose from.
For a bash shell:
export MPI_IB_PKEY="0xffff"
Upvotes: 2
Reputation: 1
cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed
-> This is infiniband/rmda, very likely totally unrelated to your file systems.
Upvotes: 0