Reputation: 65
I'm performing a finite element analysis using a software called ElmerFEM. The input to the simulation, a mesh file, is partitioned into 40 segments. Each segment can be distributed to a node for concurrent analysis. My issue is that I don't have access to 40 nodes. To my understanding with slurm, if you allocate more than 1 task per node, each node will run these tasks concurrently. Normally this would be fine but my nodes don't have enough memory for this to be acceptable. I need each node to only run one of these tasks at a time.
Lets say I have split up the input mesh into 40 partitions (therefore, 40 slurm tasks are needed). Lets also say I only have access to 5 nodes (each node only having four physical cores). Would it be possible for me to allocate 8 tasks to each of these nodes (8*5 = 40) and have each node only run one task at a time? I know this is inefficient but it is the only solution available for my workload.
I have the following sbatch script. Currently, if I run it I get an error about not having an available configuration for the given requirements (slurm is looking for nodes that have 8 cores and my compute cluster does not have any nodes like that).
#!/bin/bash
#SBATCH --job-name=elm_x
#SBATCH --output=log/%x-%A_%a.out
#SBATCH --error=log/%x-%A_%a.err
#SBATCH --nodes=5
#SBATCH --cpus-per-task=1
#SBATCH --ntasks=40
#SBATCH --ntasks-per-node=8
srun --mpi=pmix /usr/bin/ElmerSolver_mpi case.sif
Upvotes: 0
Views: 62