Reputation: 961
I am building a simulation to randomly assign character labels to subarrays in R based on user-defined parameters.
My code is as follows
K <- 2 ### Number of subarrays
K1 <- c(1:3) ### labels in first subarray
K2 <- c(4:5) ### labels in second subarray
N <- 10
Hstar <- 5
perms <- 10 ### rows in each subarray
specs <- 1:N
specs1 <- 1:(N/2) ### specs in subarray 1
specs2 <- ((N/2) + 1):N ### specs in subarray 2
pop <- array(dim = c(c(perms, N/K), K)) ### population subarrays
haps <- as.character(1:Hstar) ### character labels
probs <- rep(1/Hstar, Hstar) ### label probabilities
### 'for' loop to randomly populate 'pop' with 'haps' according to 'probs'
for(j in 1:perms){
for(i in 1:K){
if(i == 1){
pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
}
else{
pop[j, specs1, 1] <- sample(haps[K1], size = N/2, replace = TRUE, prob = probs[K1])
pop[j, specs2, 2] <- sample(haps[K2], size = N/2, replace = TRUE, prob = probs[K2])
}
}
}
What I want to do is populate (by rows, not columns) 'pops', which consists of two subarrays, with character labels ('haps'). Specifically, subarray 1 needs to contain only labels from K1 and subarray 2 must only contain labels from K2. 'pop' has dimension 10 x 5 x 2 (50 values in subarray 1, and the remaining 50 in subarray 2). Unfortunately, R throws the error
Error in `[<-`(`*tmp*`, j, specs, i, value = c("4", "1", "3", "4", "1", :
subscript out of bounds
when the nested 'for' loop is run, and I can't seem to understand why. I believe it has to do with specs, specs1, specs2. Basically, the values from 'specs' are divided between 'specs1' and 'specs2'. However, the error suggests that the issue lies in pop[j, specs, i], but since K = 2, this part of the program should not be affected... and yet it is.
Any ideas on how to fix the issue so that the program runs for ANY value of K?
Please let me know if more clarification is needed.
Upvotes: 0
Views: 117
Reputation: 6542
R is language very efficient with vectorisation. You can use this feature to prevent using for-loop.
To make the code work, I needed to correct a few error :
specs
refers to first dimension of your array not second. specs1
and specs2
refers to second subarray (i=2
in your example). I modified following that.To fill the arrays, I generate sample of size corresponding to that array you want to fill. I used length
and dim
for that. Array is filled by column ie first column every rows then second column etc...
K <- 2 ### Number of subarrays
K1 <- c(1:3) ### labels in first subarray
K2 <- c(4:5) ### labels in second subarray
N <- 10
Hstar <- 5
perms <- 10 ### rows in each subarray
specs <- 1:N
specs1 <- 1:(N/2) ### specs in subarray 1
specs2 <- ((N/2) + 1):N ### specs in subarray 2
pop <- array(dim = c(c(perms, N/K), K)) ### population subarrays
haps <- as.character(1:Hstar) ### character labels
probs <- rep(1/Hstar, Hstar) ### label probabilities
pop[specs, , 1] <- sample(haps, size = length(specs) * dim(pop)[2], replace = TRUE, prob = probs)
pop[specs1, , 2] <- sample(haps[K1], size = length(specs1) * dim(pop)[2], replace = TRUE, prob = probs[K1])
pop[specs2, , 2] <- sample(haps[K2], size = length(specs2) * dim(pop)[2], replace = TRUE, prob = probs[K2])
pop
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] "4" "3" "2" "3" "2"
#> [2,] "5" "4" "3" "1" "4"
#> [3,] "1" "3" "4" "3" "5"
#> [4,] "3" "3" "5" "5" "3"
#> [5,] "2" "4" "3" "4" "4"
#> [6,] "3" "3" "2" "4" "1"
#> [7,] "5" "1" "4" "4" "1"
#> [8,] "4" "3" "2" "3" "2"
#> [9,] "3" "2" "3" "3" "1"
#> [10,] "3" "4" "1" "4" "2"
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] "3" "3" "2" "1" "3"
#> [2,] "2" "2" "2" "2" "2"
#> [3,] "2" "2" "2" "2" "1"
#> [4,] "2" "3" "2" "3" "1"
#> [5,] "1" "2" "2" "3" "2"
#> [6,] "5" "5" "5" "4" "5"
#> [7,] "4" "5" "4" "5" "5"
#> [8,] "5" "5" "4" "5" "5"
#> [9,] "4" "5" "5" "4" "4"
#> [10,] "5" "4" "5" "5" "4"
I think you build on that to parametrized allowing use of any K value.
Upvotes: 0
Reputation: 2867
Let me divide error on parts. Line below has incorrectly specified assignment dimension.
I noticed there some inconsistency, because you are trying to loop by row (10 iterations) and each row has 5 elements (5 columns). I suspect you were going to loop by column, so it should be perms=5
.
Just to picture this issue, if you debug code by each element, you will see that pop[j, specs, i]
. You are trying to refer to pop[ 1 , 1:10 , 1]
, and your subarray has dimension 10x5
, which means that you have to switch rather to pop[,1,1]
(you don't need to specify 1:10 as far as it is the whole column).
pop[j, specs, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
sample(haps, size = N, replace = TRUE, prob = probs)
# [1] "3" "1" "4" "3" "2" "1" "1" "1" "2" "2"
pop[j, specs, i]
# Error in pop[j, specs, i] : subscript out of bounds
pop[specs, j, i]
# [1] "5" "2" "1" "4" "3" "5" "1" "5" "5" "2"
pop[, j, i] <- sample(haps, size = N, replace = TRUE, prob = probs)
# [,1] [,2] [,3] [,4] [,5]
# [1,] "5" NA NA NA NA
# [2,] "1" NA NA NA NA
# [3,] "4" NA NA NA NA
# [4,] "1" NA NA NA NA
# [5,] "1" NA NA NA NA
# [6,] "2" NA NA NA NA
# [7,] "5" NA NA NA NA
# [8,] "5" NA NA NA NA
# [9,] "3" NA NA NA NA
#[10,] "3" NA NA NA NA
Same issue emerges in the else
part, where I can see the same error. Below correct one
pop[specs1 , j, 2] <- sample(haps[K1], size = N/2, replace = TRUE, prob = probs[K1])
pop[specs2 , j, 2] <- sample(haps[K2], size = N/2, replace = TRUE, prob = probs[K2])
Anyway there is a better way to do this task:
pop[,,1] <-
apply(
pop[,,1], 2,
function(x) sample(haps, size = N, replace = TRUE, prob = probs) )
pop[specs1,,2] <-
apply(
pop[specs1,,2], 2, function(x)
sample(haps[K1], size = N/2, replace = TRUE, prob = probs[K1]) )
pop[specs2,,2] <-
apply(
pop[specs2,,2], 2, function(x)
sample(haps[K2], size = N/2, replace = TRUE, prob = probs[K2]) )
Upvotes: 1