Reputation: 23
I'm trying to run a code on multi-gpu using OpenMP+OpenACC , so a single thread is attached to a given single gpu to run it on multi-gpu, So what resources should I avail on HPC cluster to get max-performance and scalability. Eg. --cpus-per-task , --ntasks-per-cpu etc.
!Attaching threads to a gpu
thd = omp_get_max_threads()
ide = omp_get_thread_num()
!!$acc init device_type(acc_device_nvidia)
call acc_set_device_num(ide, acc_device_nvidia)
Upvotes: 0
Views: 191
Reputation: 5646
I'd recommend asking your system administrator or the cluster's support team. The answer to which batch scheduling options use or node topography will be specific to the cluster.
Note that I generally recommend using MPI+OpenACC for multi-gpu program. With OpenMP+OpenACC, codes often need to include domain decomposition which is not natural for OpenMP but is for MPI. Hence it's better to use MPI since the code is then able to run on multi-node not just a single node.
Plus with MPI you'll have one-to-one relation between the rank and GPU, greatly simplifying things. Assuming you're using an NVIDIA device, you can also take advantage of things like CUDA Aware MPI for GPU direct communication and MPS to run multiple ranks per GPU.
While a few years old, here's a good video from GTC2016 on using MPI+OpenACC.
https://www.youtube.com/watch?v=xD42obq_ems
Upvotes: 1