m13op22
m13op22

Reputation: 2337

GNU Parallel not passing strings to MATLAB

I'm trying to use GNU parallel to run a set of experiments using MATLAB on our supercomputer which uses SLURM. I have a text file containing combinations of 4 parameters that are read in and passed to a MATLAB function. That text file is called gnu_parameters.txt and and has 4 columns separated by a single space.

fs_method data_name use_vars 1
fs_method1 data_name use_vars 1
fs_method3 data_name use_vars 1 

where parameters in columns 1-3 should be read in as a string, and parameter 4 is a number.

I want to run each combination of parameters in parallel to speed up the process. My SLURM script is below, but when I tell GNU-parallel where to put each parameter using the notation {1} {2} {3} {4}, I get an error that MATLAB doesn't recognize the variable fs_method. Looking at the log tells me that the error means fs_method isn't read as a string by MATLAB. To fix that, I tried adding single quotes in the SLURM script like so:

#!/bin/bash -l
#SBATCH --time=4-00:00:00
#SBATCH --ntasks=1
#SBATCH --mem=1200g
#SBATCH --tmp=500g
#SBATCH --cpus-per-task=115
#SBATCH --mail-type=FAIL,END
#SBATCH --mail-user=myemail
#SBATCH -p groupPartition
cd $WRK_DIR
module load matlab
module load parallel
export JOBS_PER_NODE=$(( $SLURM_CPUS_ON_NODE / $SLURM_CPUS_PER_TASK ))
echo $JOBS_PER_NODE
cat gnu_parameters.txt | parallel --jobs $JOBS_PER_NODE --joblog tasklog.log --progress --colsep ' ' 'matlab -nodisplay -r "run_holdout_parallel('{1}', '{2}', '{3}', {4});exit" ' 

Below are excerpts from the log file, the error file, and the output file.

Log

Seq Host    Starttime   JobRuntime  Send    Receive Exitval Signal  Command
1   :   1719498346.300      14.911  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method, data_name, use_vars, 1);exit" 
2   :   1719498361.751      14.387  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method1, data_name, use_vars, 1);exit" 
3   :   1719498376.666      14.385  0   298 0   0   matlab -nodisplay -r "run_holdout_parallel(fs_method3, data_name, use_vars, 1);exit" 

Error File

local:1/0/100%/0.0s sh: /dev/tty: No such device or address

local:1/0/100%/0.0s sh: /dev/tty: No such device or address

local:1/0/100%/0.0s {Unrecognized function or variable 'fs_method'.
}

local:0/1/100%/15.0s 

Output file

                            < M A T L A B (R) >
                  Copyright 1984-2023 The MathWorks, Inc.
             R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
                              January 30, 2024

 
To get started, type doc.
For product information, visit www.mathworks.com.
 

                            < M A T L A B (R) >
                  Copyright 1984-2023 The MathWorks, Inc.
             R2023b Update 7 (23.2.0.2515942) 64-bit (glnxa64)
                              January 30, 2024

 
To get started, type doc.
For product information, visit www.mathworks.com.

But that returns the same error. How can I get these parameters passed as strings to MATLAB? Is there a better way to run these experiments in parallel than the method I'm doing?

Upvotes: 2

Views: 80

Answers (3)

Ole Tange
Ole Tange

Reputation: 33740

I hate quoting. man parallel says:

Conclusion: If this is confusing consider avoiding having to deal with quoting by writing a small script or a function (remember to export -f the function) and have GNU parallel call that.

So in your case make a function:

run_holdout() {
  echo This should run_holdout_parallel on $1 $2 $3 $4
  matlab -nodisplay -r "run_holdout_parallel(\"$1\", \"$2\", \"$3\", $4);exit"
}

When you can run that on the command line:

$ run_holdout fs_method3 data_name use_vars 1 

and that works, then parallelize with:

$ export -f run_holdout
$ ... | parallel run_holdout {1} {2} {3} {4} 

Upvotes: 3

Edric
Edric

Reputation: 25160

A few suggestions that don't exactly directly answer your question, but might be helpful to get your job done more efficiently.

  • Check with your sysadmin to see if you've got MATLAB Parallel Server available. This is generally more "efficient" in terms of consuming licences on a cluster - your approach will use multiple "full MATLAB" licences, rather than simply Parallel Server worker licences
  • If you can use MATLAB Parallel Server, you can use simple high-level MATLAB constructs like parfor to scale your work up to your cluster
  • If you don't have Parallel Server, you might yet have Parallel Computing Toolbox available, and therefore be able to run parpool('local') on the node allocated to you by SLURM
  • These days, matlab -batch is a much better option than matlab -r for this sort of thing.

Upvotes: 2

m13op22
m13op22

Reputation: 2337

The issue is how the quotes are being escaped. The way that works for me is

'matlab -nodisplay -r "run_holdout_parallel(\\\"{1}\\\", \\\"{2}\\\", \\\"{3}\\\", {4});exit" '

For some reason, using the answer in the comments

'matlab -nodisplay -r "run_holdout_parallel('"'"{1}"'"', '"'"{2}"'"', '"'"{3}"'"', {4});exit" '

ran fine but no commands were executed.

Upvotes: 2

Related Questions