Reputation: 10629
I have a script test.R
that takes arguments arg1
, arg2
and outputs a arg1-arg2.csv
file.
I would like to run the test.R in 6 parallel sessions (i am on a 6 core CPU) and in the background. How can I do it?
I am on linux
Upvotes: 2
Views: 5665
Reputation: 368599
You did not provide a reproducible example so I am making one up. As you are on Linux, I am also swicthing to littler which was after all writtten for the very purpose of scripting with R.
#!/usr/bin/env r
#
# a simple example to install one or more packages
if (is.null(argv) | length(argv) != 2) {
cat("Usage: myscript.r arg1 arg2\n")
q()
}
filename <- sprintf("%s-%s.csv", argv[1], argv[2])
Sys.sleep(60) # do some real work here instead
write.csv(matrix(rnorm(9), 3, 3), file=filename)
and you can then lauch this either from the command-line as I do here, or from another (shell) script. The key is the &
at the end to send it in the background:
edd@max:/tmp/tempdir$ ../myscript.r a b &
[1] 19575
edd@max:/tmp/tempdir$ ../myscript.r c d &
[2] 19590
edd@max:/tmp/tempdir$ ../myscript.r e f &
[3] 19607
edd@max:/tmp/tempdir$
The [$n]
indicates how process how been launched in the background, the number that follows is the process id which you can use to monitor or kill. After a little while we get the results:
edd@max:/tmp/tempdir$
[1] Done ../myscript.r a b
[2]- Done ../myscript.r c d
[3]+ Done ../myscript.r e f
edd@max:/tmp/tempdir$ ls -ltr
total 12
-rw-rw-r-- 1 edd edd 192 Jun 24 09:39 a-b.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 c-d.csv
-rw-rw-r-- 1 edd edd 193 Jun 24 09:40 e-f.csv
edd@max:/tmp/tempdir$
You may want to read up on Unix shells to learn more about &
m the fg
and bg
background s etc.
Lastly, all this can a) also be done with Rscript
though picking arguments is slightly different and b) there are CRAN packages getopt and optparse to facilitate working with command-line arguments.
Upvotes: 3
Reputation: 61077
I suggest using the doParallel backend for the foreach package. The foreach package provides a nice syntax to write loops and takes care of combining the results. doParallel connects it to the parallel package included since R 2.14. On other setups (older versions of R, clusters, whatever) you could simply change the backend without touching any of your foreach loops. The foreach package in particular has excellent documentation, so it is really easy to use.
If you are going to write the results to individual files, then the result-combining features of foreach won't be of much use to you. So people might argue that direct use of parallel would be better suited to your application. Personally I find the way foreach expresses looping concepts much easier to use.
Upvotes: 4
Reputation: 10215
The state of art would be to use the parallel package, but when I am lazy, I simply start 6 batch (cmd, assuming Windows) files with rscript.
You can set parameters in the cmd-file
SET ARG1="myfile"
rscript rest.r
and read it from
Sys.getenv("ARG")
Using 6 batch files, I can also append multiple runs in one batch to be sure that the cores are always busy.
Upvotes: 1