hduece
hduece

Reputation: 71

SLURM job failing with sbatch, successful with srun

A researcher is submitting a job to our cluster that is failing when run with sbatch, but succeeding when run with srun. Any ideas on why this could be? I’ve included the error messages and the slurm script below:

Error message:


Unable to init server: Could not connect: Connection refused

(canavier_model_changes_no_plots.py:1589287): Gdk-CRITICAL **: 22:46:57.434: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

can't open DISPLAY

My first thought based on that error was that it is something with the code that slurm is running rather than with the slurm functions itself, but not sure why srun would work if that is the case?

Here is the slurm script:


#SBATCH --job-name=networkmodel

#SBATCH --nodes=1

#SBATCH --cpus-per-task=10

#SBATCH --mem-per-cpu=4G

#SBATCH --time=00-00:05:00

python3 canavier_model_changes_no_plots.py

She thought it might have something to do with matplotlob scripts in her code, but it still failed when those were removed. Again, the code runs with srun, and fails with sbatch.

Upvotes: 0

Views: 787

Answers (1)

damienfrancois
damienfrancois

Reputation: 59330

The error message is indicative that the job is trying to run an X11 application that attempts to create a GUI window. Matplotlib might very well be the cause indeed. The script should make sure to only create files and not try anything related to GUI windows.

Upvotes: 1

Related Questions