Tom
Tom

Reputation: 151

specifying job arrays in LSF

My objective is to repeatedly run an R script, each time with a different set of parameters.

To do so, I have been using a bash script to pass the command-line parameters to the R script by looping through an input file, in which each line contains a different combination of 7 parameters.

The input file looks like this:

10 food 0.00005 0.002 1 OBSERVED 0
10 food 0.00005 0.002 1 OBSERVED 240
10 food 0.00005 0.002 1 OBSERVED 480
10 food 0.00005 0.002 1 OBSERVED 720
10 food 0.00005 0.002 1 OBSERVED 960
10 food 0.00005 0.002 1 OBSERVED 1200

The R script to which the command-line parameters are passed, begins like this:

commandArgs(trailingOnly=FALSE)
A <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -6 )]) 
B <-             commandArgs()[as.numeric(length(commandArgs()) -5 )]  
C <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -4 )]) 
D <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -3 )]) 
E <- as.numeric (commandArgs()[as.numeric(length(commandArgs()) -2 )])
F <-             commandArgs()[as.numeric(length(commandArgs()) -1 )]  
G <- as.numeric (commandArgs()[as.numeric(length(commandArgs())    )]) 

The bash loop that reads these in and dispatches the R script, is as follows;

#!/bin/bash
N=0
cat Input.txt | while read LINE ; do
N=$((N+1))
echo "R --no-save < /home/trichard/Script.R" "$LINE" |  bsub  -N -q priority -R "select[model==Xeon5450]"  
done

However, the problem is that there are millions of lines in Input.txt, so this approach is way too slow (it prevents other LSF users from submitting their own jobs).

So, the question is, how to do the above using an LSF array?

Upvotes: 3

Views: 1204

Answers (3)

Tom
Tom

Reputation: 151

The answer of Steve Westson works well; thanks!

However, in the LSF system, the maximum N jobs within a single array is limited to ~1000. That means that when you have >1000 jobs, you need to submit multiple job arrays, like this:

#!/bin/bash
increment=1000
startvalue=1
stopvalue=$(wc -l < Col_Treat_BETA_MU_RAND_METHOD_part1.txt)                           
stopvalue=$((  ($increment*((stopvalue+999)/$increment))+$increment ))                 
end=$increment

for ((s=$startvalue,e=$end ; e<$stopvalue; s+=$increment,e+=$increment)); do
  echo $s "-" $e
 echo 'R --no-save -f script.R --args $(sed "${LSB_JOBINDEX}q;d" input.txt)' |  bsub -J "R_Job[$s-$e]"  -N -q normal
done

so, this successfully submits all jobs instantaneously, wihtout the original job-by-job loop that essentially blocks other users, and annoys your sysadmin. Thanks again!

I am posting this as an answer as it exceeds the max length for a comment.

Upvotes: 0

sluedtke
sluedtke

Reputation: 334

maybe you should consider putting it all into R and use a 'foreach' loop construct with a proper parallelization framework like 'doMPI' (or pure Rmpi if your are really motivated ;-)). So the job management system on the cluster has full control and your are basically submitting one single job.

Rather a hint then a solution to your specific problem.

Upvotes: 0

Steve Weston
Steve Weston

Reputation: 19677

The main trick is to extract the nth line from the input file. Assuming you're on a Unix-like system, you can use the "sed" command to do that. Here's an example:

N=$(wc -l < input.txt)
echo 'R --no-save -f Script.R --args $(sed "${LSB_JOBINDEX}q;d" input.txt)' |
  bsub -J "R_Job[1-$N]" -N -q priority -R "select[model==Xeon5450]"

Correct argument quoting is a bit tricky and very important in this example.

Note that this uses the R "--args" option to avoid warnings messages about unrecognized arguments. I'd also suggest using commandArgs(trailingOnly=TRUE) in the R script so you only see the arguments of interest.

Upvotes: 3

Related Questions