Chintan Pathak
Chintan Pathak

Reputation: 301

How to spawn a long-running parallel process in R that runs an R script?

I have a shiny application that collects some parameters from users, which will then be used to run a simulation, which takes a long time ~ 1-3 days. So, I want the ability to tell the user to check back when the simulation is solved, as well as provide the capability to submit more input scenarios. So I want to spawn a subprocess to perform the simulation in the background, while my shiny app runs.

I tried the package 'subprocess' and 'processx', which allows one to spawn external processes and interact with them. However, I want the ability to run an entire R script with the spawned process. i.e. run source('simulation_script.R'). Also, I do not care if I am not able to share any data with my parent shiny app. I read inputs from files and write outputs to files, which can then be shown to the shiny app user. Any pointers to achieve this would be helpful?

Upvotes: 0

Views: 1052

Answers (2)

MrGumble
MrGumble

Reputation: 5776

When you computation process takes much longer than the user's interactive session is expected to last, you should split the entire process into 3 parts: Data input, computation, presentation of result.

Now there are several reasons for this:

  1. Your user is likely to log off their shiny session during a 1-3 day interval. You must anticipate a way for them to "reconnect" to their results.
  2. If you spawn a sub-process from a Shiny session, is it entirely independent from its parent process? What happens if you restart your Shiny server? Does it kill the spawned sub-process?
  3. How do you handle multiple users? If me, Jack, and Bob each start our simulations before lunch, your Shiny process will be pulling the Shiny server plus 3 simulations. Roger comes back from lunch and tries to submit his simulation - does the Shiny server have enough resources to serve Roger's Shiny session?

So, you handle your simulation in 3 parts:

  1. A Shiny app that takes the user's orders and submits it to a queue.
  2. A queue and a computation part that runs independent of the other 2 parts. It's sole responsibility is to take the next order in the queue, mark it as being processed, do the computations, and when done/on errors, mark the order as such and save the output.
  3. A Shiny app that can display the output / results.

The queue and computation part are basically a database (as simple as SQLite or MySQL or MS SQL, whatever you have lying around), as long as it supports multiple processes. The computation part is then a script that loops endlessly, asks for a task and does it. This allows you to scale it (simply launching several instances of the script), moving it to more powerful calculation nodes, etc., without affecting the presentation in the Shiny apps.

Upvotes: 3

ismirsehregal
ismirsehregal

Reputation: 33580

To use r_bg just wrap your source() call in a function (which should be self-contained) like this:

library(callr)

# create dummy script
writeLines('writeLines(as.character(Sys.time()), "myResult.csv")', 'myRScript.R')

# execute dummy script in background R process
r_bg(function(){source('myRScript.R')})

# read results
read.csv('myResult.csv')

Upvotes: 2

Related Questions