Reputation: 10629

Run R script in parallel sessions in the background

I have a script test.R that takes arguments arg1, arg2 and outputs a arg1-arg2.csv file.

I would like to run the test.R in 6 parallel sessions (i am on a 6 core CPU) and in the background. How can I do it?

I am on linux

Upvotes: 2

Answers (3)

Dirk is no longer here

Reputation: 368599

You did not provide a reproducible example so I am making one up. As you are on Linux, I am also swicthing to littler which was after all writtten for the very purpose of scripting with R.

#!/usr/bin/env r
#
# a simple example to install one or more packages

if (is.null(argv) | length(argv) != 2) {
    cat("Usage: myscript.r arg1 arg2\n")
    q()
}

filename <- sprintf("%s-%s.csv", argv[1], argv[2])
Sys.sleep(60)   # do some real work here instead
write.csv(matrix(rnorm(9), 3, 3), file=filename)

and you can then lauch this either from the command-line as I do here, or from another (shell) script. The key is the & at the end to send it in the background:

edd@max:/tmp/tempdir$ ../myscript.r a b &
[1] 19575
edd@max:/tmp/tempdir$ ../myscript.r c d &
[2] 19590
edd@max:/tmp/tempdir$ ../myscript.r e f &
[3] 19607
edd@max:/tmp/tempdir$

The [$n] indicates how process how been launched in the background, the number that follows is the process id which you can use to monitor or kill. After a little while we get the results:

edd@max:/tmp/tempdir$ 
[1]   Done                    ../myscript.r a b
[2]-  Done                    ../myscript.r c d
[3]+  Done                    ../myscript.r e f
edd@max:/tmp/tempdir$ ls -ltr
total 12
-rw-rw-r-- 1 edd edd 192 Jun 24 09:39 a-b.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 c-d.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 e-f.csv
edd@max:/tmp/tempdir$

You may want to read up on Unix shells to learn more about &m the fg and bg background s etc.

Lastly, all this can a) also be done with Rscript though picking arguments is slightly different and b) there are CRAN packages getopt and optparse to facilitate working with command-line arguments.

Upvotes: 3

MvG

Reputation: 61077

I suggest using the doParallel backend for the foreach package. The foreach package provides a nice syntax to write loops and takes care of combining the results. doParallel connects it to the parallel package included since R 2.14. On other setups (older versions of R, clusters, whatever) you could simply change the backend without touching any of your foreach loops. The foreach package in particular has excellent documentation, so it is really easy to use.

If you are going to write the results to individual files, then the result-combining features of foreach won't be of much use to you. So people might argue that direct use of parallel would be better suited to your application. Personally I find the way foreach expresses looping concepts much easier to use.

Upvotes: 4

Dieter Menne

Reputation: 10215

The state of art would be to use the parallel package, but when I am lazy, I simply start 6 batch (cmd, assuming Windows) files with rscript.

You can set parameters in the cmd-file

SET ARG1="myfile"
rscript rest.r

and read it from

Sys.getenv("ARG")

Using 6 batch files, I can also append multiple runs in one batch to be sure that the cores are always busy.

Upvotes: 1

Run R script in parallel sessions in the background

Answers (3)

Related Questions