Reputation: 2337
I'm trying to use GNU parallel to run a set of experiments using MATLAB on our supercomputer which uses SLURM. I have a text file containing combinations of 4 parameters that are read in and passed to a MATLAB function. That text file is called gnu_parameters.txt
and and has 4 columns separated by a single space.
fs_method data_name use_vars 1
fs_method1 data_name use_vars 1
fs_method3 data_name use_vars 1
where parameters in columns 1-3 should be read in as a string, and parameter 4 is a number.
I want to run each combination of parameters in parallel to speed up the process. My SLURM script is below, but when I tell GNU-parallel where to put each parameter using the notation {1} {2} {3} {4}
, I get an error that MATLAB doesn't recognize the variable fs_method
. Looking at the log tells me that the error means fs_method
isn't read as a string by MATLAB. To fix that, I tried adding single quotes in the SLURM script like so:
#!/bin/bash -l
#SBATCH --time=4-00:00:00
#SBATCH --ntasks=1
#SBATCH --mem=1200g
#SBATCH --tmp=500g
#SBATCH --cpus-per-task=115
#SBATCH --mail-type=FAIL,END
#SBATCH --mail-user=myemail
#SBATCH -p groupPartition
cd $WRK_DIR
module load matlab
module load parallel
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
echo $JOBS_PER_NODE
cat gnu_parameters.txt | parallel --jobs $JOBS_PER_NODE --joblog tasklog.log --progress --colsep ' ' 'matlab -nodisplay -r "run_holdout_parallel('{1}', '{2}', '{3}', {4});exit" '
Below are excerpts from the log file, the error file, and the output file.
Log
Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
1 : 1719498346.300 14.911 0 298 0 0 matlab -nodisplay -r "run_holdout_parallel(fs_method, data_name, use_vars, 1);exit"
2 : 1719498361.751 14.387 0 298 0 0 matlab -nodisplay -r "run_holdout_parallel(fs_method1, data_name, use_vars, 1);exit"
3 : 1719498376.666 14.385 0 298 0 0 matlab -nodisplay -r "run_holdout_parallel(fs_method3, data_name, use_vars, 1);exit"
Error File
local:1/0/100%/0.0s sh: /dev/tty: No such device or address
local:1/0/100%/0.0s sh: /dev/tty: No such device or address
local:1/0/100%/0.0s {Unrecognized function or variable 'fs_method'.
}
local:0/1/100%/15.0s
Output file
< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
January 30, 2024
To get started, type doc.
For product information, visit www.mathworks.com.
< M A T L A B (R) >
Copyright 1984-2023 The MathWorks, Inc.
R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
January 30, 2024
To get started, type doc.
For product information, visit www.mathworks.com.
But that returns the same error. How can I get these parameters passed as strings to MATLAB? Is there a better way to run these experiments in parallel than the method I'm doing?
Upvotes: 2
Views: 80
Reputation: 33740
I hate quoting. man parallel
says:
Conclusion: If this is confusing consider avoiding having to deal with quoting by writing a small script or a function (remember to export -f the function) and have GNU parallel call that.
So in your case make a function:
run_holdout() {
echo This should run_holdout_parallel on $1 $2 $3 $4
matlab -nodisplay -r "run_holdout_parallel(\"$1\", \"$2\", \"$3\", $4);exit"
}
When you can run that on the command line:
$ run_holdout fs_method3 data_name use_vars 1
and that works, then parallelize with:
$ export -f run_holdout
$ ... | parallel run_holdout {1} {2} {3} {4}
Upvotes: 3
Reputation: 25160
A few suggestions that don't exactly directly answer your question, but might be helpful to get your job done more efficiently.
parfor
to scale your work up to your clusterparpool('local')
on the node allocated to you by SLURMmatlab -batch
is a much better option than matlab -r
for this sort of thing.Upvotes: 2
Reputation: 2337
The issue is how the quotes are being escaped. The way that works for me is
'matlab -nodisplay -r "run_holdout_parallel(\\\"{1}\\\", \\\"{2}\\\", \\\"{3}\\\", {4});exit" '
For some reason, using the answer in the comments
'matlab -nodisplay -r "run_holdout_parallel('"'"{1}"'"', '"'"{2}"'"', '"'"{3}"'"', {4});exit" '
ran fine but no commands were executed.
Upvotes: 2