Reputation: 25
I have an R script with hundreds of lines. This script eventually gives me a single numerical answer at the end. Now I want to create a confidence interval and hence run this whole script over and over multiple times to be able to calculate the mean and standard deviation. But I do not want to create a 'for' loop over the whole thing because that becomes really complicated
After some research, I came across this method:
My final answer is named as 'result' and then in a new script file,
result_list<-lapply(1:10, function(n)source("my_script_file.R"))
result_list
(repeating 10 times for example)
However the final results looks like this,
[[1]]
[[1]]$value
[1] 136.9876
[[1]]$visible
[1] TRUE
[[2]]
[[2]]$value
[1] 138.4969
[[2]]$visible
[1] TRUE
[[3]]
[[3]]$value
[1] 0.2356484
[[3]]$visible
[1] TRUE
.
.
Now I have no idea what the second line means in every iterations? And how do I get the list of values, result_list$values
doesn't work, while also ignoring the too small values that could be simulation error as like the 3rd one in here to be able to calculate the mean and sd.
Also Is there any other way to repeat this process except this method?
Upvotes: 2
Views: 11817
Reputation: 193517
I would recommend making your script as a function, loading the function once, and then using replicate
instead of lapply(1:n, ...)
.
Here's a very simple example:
Imagine you were working with a simple R script file that had the following contents:
## saved in working directory as "testfun.R"
myFun <- function(x, y, z) {
mean(rnorm(x)) + mean(rnorm(y)) + mean(rnorm(z))
}
myFun(10, 12, 14)
## End of "testfun.R" file
Now, compare the timings of having to source
100 times with having to simply run the function 100 times:
fun1 <- function(n = 10) replicate(n, myFun(10, 12, 14))
fun2 <- function(n = 10) lapply(1:n, function(x) source("testfun.R")$value)
library(microbenchmark)
microbenchmark(fun1(100), fun2(100), unlist(fun2(100)), times = 1)
## Unit: milliseconds
## expr min lq mean median uq max neval
## fun1(100) 3.064384 3.064384 3.064384 3.064384 3.064384 3.064384 1
## fun2(100) 59.635228 59.635228 59.635228 59.635228 59.635228 59.635228 1
## unlist(fun2(100)) 61.349713 61.349713 61.349713 61.349713 61.349713 61.349713 1
I'm not sure how much of a difference it would make in the long run if more of the time is taken up in processing (rather than reading the source file), but I would still consider a function + replicate
as a cleaner and easier-to-read alternative.
Upvotes: 2
Reputation: 887048
We can use $value
to get the 'value' from each iteration
lapply(1:10, function(n)source("my_script_file.R")$value)
As it is a single element, it may be also useful to use sapply
to get a vector
output
v1 <- sapply(1:10, function(n)source("my_script_file.R")$value)
We can subset the vector
for values greater than a particular threshold, for example 0.5,
v1[v1 > 0.5]
Upvotes: 0