Reputation: 5935
I am working with the R programming language. I got the following loop to run:
library(dplyr)
list_results <- list()
for (i in 1:100){
c1_i = c2_i = c3_i = 0
while(c1_i + c2_i + c3_i < 15 ){
num_1_i = sample_n(iris, 30)
num_2_i = sample_n(iris, 30)
num_3_i = sample_n(iris, 30)
c1_i = mean(num_1_i$Sepal.Length)
c2_i = mean(num_2_i$Sepal.Length)
c3_i = mean(num_3_i$Sepal.Length)
ctotal_i = c1_i + c2_i + c3_i
combined_i = rbind(num_1_i, num_2_i, num_3_i)
nrow_i = nrow(unique(combined_i[duplicated(combined_i), ]))
}
inter_results_i <- data.frame(i, c1_i, c2_i, c3_i, nrow_i, ctotal_i)
list_results[[i]] <- inter_results_i
}
Now, I want to try and add a second condition to this loop. Using this post as a reference (How to have two conditions in a While loop?), I tried to do this as follows:
list_results <- list()
for (i in 1:100){
c1_i = c2_i = c3_i = ctotal_i = 0
while(c1_i + c2_i + c3_i < 15 && nrow_i == 0 ) {
num_1_i = sample_n(iris, 30)
num_2_i = sample_n(iris, 30)
num_3_i = sample_n(iris, 30)
c1_i = mean(num_1_i$Sepal.Length)
c2_i = mean(num_2_i$Sepal.Length)
c3_i = mean(num_3_i$Sepal.Length)
ctotal_i = c1_i + c2_i + c3_i
combined_i = rbind(num_1_i, num_2_i, num_3_i)
nrow_i = nrow(unique(combined_i[duplicated(combined_i), ]))
}
inter_results_i <- data.frame(i, c1_i, c2_i, c3_i, ctotal_i, nrow_i)
list_results[[i]] <- inter_results_i
}
But for some reason, this is always producing an "empty" list.
Can someone please show me what I am doing wrong and how to fix this?
Thanks!
Upvotes: 0
Views: 423
Reputation: 50738
Here is an attempt at optimising your code using vectorised functions. I have also renamed your variables to be more descriptive.
# Set fixed seed for reproducibility
set.seed(2020)
sample_function <- function(sum_of_mean_thresh = 15, n_dupes_thresh = 10) {
# Still uses a `while` loop
sum_of_mean <- 0
n_dupes <- 0
sample_idx <- matrix()
while(sum_of_mean < sum_of_mean_thresh & n_dupes < n_dupes_thresh) {
sample_idx <- replicate(3L, sample(nrow(iris), 30L))
sum_of_mean <- sum(apply(sample_idx, 2, function(row) mean(iris$Sepal.Length[row])))
n_dupes <- sum(duplicated(as.integer(sample_idx)))
}
# Return:
# - 30x3 matrix of row indices for each of the 3 samples
# - the sum of the mean of the sampled iris$Sepal.Length
# - the number of duplicate rows across all 3x30 samples
list(sample_idx = sample_idx, sum_of_mean = sum_of_mean, n_dupes = n_dupes)
}
# Execute the sample function 100 times and return a `list`
# (with every element being a `list` returned from `sample_function()`)
replicate(100, sample_function(), simplify = FALSE)
This should be significantly faster than the original code.
Upvotes: 1