Reputation: 29
I want to use a cluster running Slurm for a simulation. I intend to access it using OpenMPI. The simulator consists of a sequential optimizer implemented in Python, written by Facebook. I provide a C++ library to calculate the error of each optimizer step.
The optimizer provides parameters, which I use in the C++ library to calculate the error. The error is calculated in hundreds of points, and these partial results are simply summed up to give the optimizer error for the current optimization step.
I need Slurm to start the optimizer and a set of processes, which would calculate the partial errors in parallel. I need something like a thread pool (current desktop implementation). This means the processes would wait for the parameters, and would also know which partial results to calculate. When finished, they would return the partial results to the optimizer, and wait for the parameters from the next step. This way I could avoid recreating the processes in each step.
The optimizer would wait for all the partial results. When got them all, could make the next optimization step. The optimizer typically needs tens of thousands of steps.
I haven't found any similar use case on the Slurm site. This will be my first Slurm project.
How could I start with it? Thank you in advance. Best regards: Balázs Bámer
Upvotes: 0
Views: 33