Reputation: 2724
This seems very related to several other questions that have been asked (this one for example), but I can't quite figure out how to do exactly what I want. Maybe replacement functions are the wrong tool for the job, which would also be a perfectly acceptable answer. I am much more familiar with Python than R and I can easily think of how I want to do it in Python but I can't quite get my head around how to approach it in R.
The problem: I am trying to modify an object in place within a function, without having to return it, but I don't need to pass in the value that modifies it, because this value is the result of a function call that's already contained in the object.
More specifically, I have a list (technically it's an s3 class, but I don't think that's actually relevant to this issue) that contains some things relating to a process started with processx::process$new()
call. For reproducibility, here's a toy shell script you can run, and the code to get my res
object:
echo '
echo $1
sleep 1s
echo "naw 1"
sleep 1s
echo "naw 2"
sleep 1s
echo "naw 3"
sleep 1s
echo "naw 4"
sleep 1s
echo "naw 5"
echo "All done."
' > naw.sh
Then my wrapper is something like this:
run_sh <- function(.args, ...) {
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = NULL))
}
res <- run_sh(c("naw.sh", "hello"))
And res
should look like
$process
PROCESS 'sh', running, pid 19882.
$output
NULL
$orig_args
[1] "naw.sh" "hello"
So, the specific issue here is a bit peculiar to process$new
but I think the general principle is relevant. I am trying to collect all the output from this process after it is finished, but you can only call process$new$read_all_output_lines()
(or it's sibling functions) once because the first time it will return the result from the buffer and the subsequent times it returns nothing. Also, I am going to call a bunch of these and then come back to "check on them" so I can't just call res$process$read_all_output_lines()
right away because then it will wait for the process to finish before the function returns, which is not what I want.
So I'm trying to store the output of that call in res$output
and then just keep that and return it on subsequent calls. Soooo... I need to have a function to modify res
in place with res$output <- res$process$read_all_output_lines()
.
Here's what I tried, based on guidance like this, but it didn't work.
get_output <- function(.res) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
invisible()
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
return(.res$output)
}
}
`update_output<-` <- function(.res, ..., value) {
.res$output <- value
.res
}
Calling get_output(res)
works the first time, but it does not store the output in res$output
to be accessed later, so subsequent calls return nothing.
I also tried something like this:
`get_output2<-` <- function(.res, value) {
# check if process is still alive (as of now, can only get output from finished process)
if (.res$process$is_alive()) {
warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
.res
} else {
# if output has not been read from buffer, read it
if (is.null(.res$output)) {
output <- .res$process$read_all_output_lines()
update_output(.res) <- output
}
# return output
print(value)
.res
}
}
Which just throws away the value
but this feels silly because you have to call it with the assignment like get_output(res) <- "fake"
which I hate.
Obviously I could also just return the modified res
object, but I don't like that because then the user has to know to do res <- get_output(res)
and if they forget to do that (the first time) then the output is lost to the ether and can never be recovered. Not good.
Any help is much appreciated!
Upvotes: 1
Views: 143
Reputation: 173793
After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:
check_result <- function(process_list)
{
# Capture the name of the passed object as a string
list_name <- deparse(substitute(process_list))
# Check the object exists in the calling environment
if(!exists(list_name, envir = parent.frame()))
stop("Object '", list_name, "' not found")
# Create a local copy of the passed object in function scope
copy_of_process_list <- get(list_name, envir = parent.frame())
# If the process has completed, write its output to the copy
# and assign the copy to the name of the object in the calling frame
if(length(copy_of_process_list$process$get_exit_status()) > 0)
{
copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
assign(list_name, copy_of_process_list, envir = parent.frame())
}
print(copy_of_process_list)
}
This will update res
if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.
So I can do
res <- run_sh(c("naw.sh", "hello"))
and check the contents of res
I have:
res
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
and if I immediately run:
check_result(res)
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> NULL
we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result
again, I get:
check_result(res)
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."
and without explicitly writing to res, it has updated via the function:
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello" "naw 1" "naw 2" "naw 3" "naw 4" "naw 5"
#> [7] "All done."
Upvotes: 1
Reputation: 173793
I may be missing something here, but why don't you just write the output after you create the object so that it's there the first time the function returns?
run_sh <- function(.args, ...)
{
p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
return(list(process = p, orig_args = .args, output = p$read_all_output_lines()))
}
So now if you do
res <- run_sh(c("naw.sh", "hello"))
You get
res
#> $`process`
#> PROCESS 'sh', finished.
#>
#> $orig_args
#> [1] "naw.sh" "hello"
#>
#> $output
#> [1] "hello"
#> [2] "naw.sh: line 2: sleep: command not found"
#> [3] "naw 1"
#> [4] "naw.sh: line 4: sleep: command not found"
#> [5] "naw 2"
#> [6] "naw.sh: line 6: sleep: command not found"
#> [7] "naw 3"
#> [8] "naw.sh: line 8: sleep: command not found"
#> [9] "naw 4"
#> [10] "naw.sh: line 10: sleep: command not found"
#> [11] "naw 5"
#> [12] "All done."
Upvotes: 1