Star
Star

Reputation: 53

How to parallelize the nested for loops in bash calling the R script

Is it possible to parallelize the following code?

for word in $(cat FileNames.txt)
do 
   for i in {1..22}
   do  
      Rscript assoc_test.R...........

   done >> log.txt
done 

I have been trying to parallelize it but have not been lucky so far. I have tried putting () around the Rscript assoc_test.R........... followed by & but it is not giving the results, and the log file turns out to be empty. Any suggestions/help would be appreciated. TIA.

Upvotes: 2

Views: 656

Answers (2)

Ole Tange
Ole Tange

Reputation: 33685

GNU Parallel is made for replacing loops, so the double loop can be replaced by:

parallel Rscript assoc_test.R... \> log.{1}.{2} :::: FileNames.txt ::: {1..22} > log.txt 

Upvotes: 2

joanis
joanis

Reputation: 12193

You can change your script to output the commands to run, and feed the results into GNU parallel:

for word in $(cat FileNames.txt)
do 
   for i in {1..22}
   do  
      echo Rscript assoc_test.R........... \> log.$word.$i
   done
done | parallel -j 4

Some details:

  • parallel -j 4 will keep 4 jobs running at a time - replace 4 by the number of CPUs you want to use.
  • Notice I redirect the output to log.$word.$i and escape the redirection operator > by using \>. I need to test and make sure it works, but the point is that since you're going parallel, you don't want to jumble all your outputs together.
  • Make sure you escape anything else the echo might interpret. The output should be valid command lines that parallel can run.

As an alternative to parallel, you can also use xargs -i. See this question for more information.

Upvotes: 3

Related Questions